Popperian Assessment Engine (The Brain)

This document defines the technical architecture of the Assessment Engine, the core reasoning layer of the Regain Health. It implements Critical Rationalism (Popper/Deutsch) to bridge the gap between probabilistic LLM behavior and rigorous medical-grade assessment.

1. The Epistemological Problem

Standard AI health tools use Inductive Reasoning: matching symptoms to patterns in a training set to output a probability (e.g., "80% chance of X").

The Failures of Induction:

Correlation vs. Causation: Confuses "A usually follows B" with "B causes A."
The Black Swan: Fails on rare cases that don't match common patterns.
Easy-to-Vary: Explanations are "mushy"—you can change details without the theory breaking.

The Popperian Solution: Critical Rationalism We do not look for confirmation (evidence that supports a theory); we look for refutation (evidence that kills a theory). A theory is only "good" if it survives rigorous attempts to destroy it.

2. Advanced Epistemological Grounding

Our implementation is grounded in the "Bleeding Edge" of AI reasoning research:

Cognitive Emulation (CoE): Unlike "Simulators" that guess text, our engine emulates human scientific reasoning. It starts from a first-principle "World Model" and only makes claims that the model can justify through refutation.
Group Relative Policy Optimization (GRPO): Inspired by DeepSeek, we use self-verification logic where agents are rewarded for finding their own errors or successfully refuting a flawed conjecture.
Sequential Falsification (POMDP): We treat health assessment as a Partially Observable Markov Decision Process. The goal is not to "find the answer" but to reduce uncertainty by iteratively designing experiments (tests) to kill incorrect theories.

3. Multi-Agent "ArgMed" Architecture

We simulate a world-class medical team using four specialized AI agents within a Stateful Graph (LangGraph). This ensures the reasoning is iterative, auditable, and follows the Generator-Verifier-Reasoner (ArgMed) pattern.

flowchart TD
    subgraph input ["Input Layer"]
        HS["Health State vN"] --> IntakeAgent["Intake Agent"]
    end

    subgraph debate ["Stateful Popper Loop (ArgMed Pattern)"]
        IntakeAgent --> PS["Problem Situation"]
        PS --> Gen["The Conjecturer (Generator)"]
        Gen --> TT["Tentative Theories + Causal Links"]
        
        TT --> Ver["The Critic (Verifier)"]
        Ver -->|"Symbolic Search"| HS
        Ver -->|"Evidence Lookup"| EE["Evidence Engine"]
        Ver -->|"Self-Argumentation"| TT
        
        Ver -->|"Refutation / HTV Score"| Rea["The Synthesizer (Reasoner)"]
        
        Rea -->|"Inadequate Explanation"| PS
        Rea -->|"Discriminator Needed"| Seq["Sequential Testing"]
    end

    subgraph output ["Output Layer"]
        Rea -->|"Survivor Theory"| Final["Glass Box Trace"]
        Seq -->|"Data Request"| User["User / Wearable Sync"]
    end

3.1 The Intake Agent (The Context Builder)

Objective: Build the "Problem Situation."
Logic: Maps raw Health State data (vitals, logs, labs) into a structured timeline.
Output: A context object containing all unexplained phenomena.

3.2 The Conjecturer (The Generator)

Objective: Bold, creative hypothesis generation with Causal Grounding.
Instruction: "What are all the biological theories that could explain this Problem Situation? For each, propose a causal mechanism (e.g., 'X causes Y because Z')."
Output: A list of Tentative Theories (TT) with attached causal links.

3.3 The Critic (The Verifier)

Objective: Systematic destruction of theories via Self-Argumentation.
Logic: Uses Negative Search + Symbolic Guardrails to kill theories. It actively searches for "Counter-Arguments" in the Health State that refute the Generator's causal links.
Output: Refutation notes and Hard-to-Vary (HTV) scores.

3.4 The Synthesizer (The Reasoner)

Objective: Selection via Argument Reconciliation.
Logic: Evaluates the survivor pool using a simplified Argument Graph. It selects the theory with the fewest un-reconciled contradictions. If uncertainty is high, it triggers the IDK Protocol.

4. Hard-to-Vary (HTV) Scoring

A theory's quality is measured by its HTV Score, not its probability.

5. The Refutation Protocol (Technical Implementation)

To prevent LLM "hallucination of confirmation," the Verifier agent uses a Negative Search pattern combined with Schema-Driven Logic. This replaces complex theorem proving with standard engineering patterns (Pydantic/Zod).

5.1 Schema-Driven Guardrails

We "sandwich" the generative LLM between deterministic validation schemas. Every conjecture must be validated against a Medical Consistency Schema.

Translation: The Verifier translates a conjecture into a structured JSON object.
Schema Validation: The system uses Pydantic (Python) or Zod (TypeScript) to enforce consistency.
- Example: A DiabetesConjecture schema requires a glucose_level > 126 or HbA1c > 6.5.
Hard Refutation: If the Health State data contradicts the schema requirements (e.g., HbA1c is 5.4), the Pydantic validator throws an error, and the theory is killed instantly.

# Practical implementation for Senior Devs
class DiabetesConjecture(BaseModel):
    logic: str = "Fasten glucose > 126"
    
    @validator('logic')
    def validate_against_health_state(cls, v, values):
        # Deterministic check against real DB
        if current_hba1c < 6.5:
            raise ValueError("Refuted: HbA1c is within normal range.")
        return v

6. Causal Links & Counterfactuals

To move beyond correlative pattern matching, the engine implements simplified causal inference.

6.1 Causal Predictions

Every conjecture must include a Falsifiable Prediction:

"If Theory T is true, then Fact F must appear in the labs within X days."
Example: "If this fatigue is caused by Iron Deficiency, then a Ferritin test must return < 30 ng/mL."

6.2 The Counterfactual Check

The Verifier agent performs a "What-If" analysis to test the robustness of a theory:

Scenario: "Would the patient still be tired if we removed the 'Overtraining' factor?"
Logic: If the symptoms can be fully explained by removing a single lifestyle factor (Counterfactual), the engine prioritizes that factor as the "Hardest-to-Vary" explanation.

7. Implementation Frameworks

To move beyond simple prompts to a robust reasoning engine, we utilize state-of-the-art agentic frameworks:

LangGraph (State Management): Used for the Stateful Popper Loop. It manages the cyclic transition between Conjecture, Refutation, and Synthesis, ensuring the process is non-linear and can backtrack when new data is required.
CrewAI / AutoGen (Epistemological Roles): Used to define the adversarial personas (House vs. Scientist). These frameworks enforce the "Fixed Epistemological Roles" by providing agents with specific "mental" constraints.
LeanDojo (Formal Verification): For high-stakes clinical protocols, we use neuro-symbolic tools to convert medical logic into formal proofs, ensuring recommendations are mathematically consistent with the user's constraints.

7. Sequential Testing (The Bridge to Protocol)

If the Synthesizer cannot reach a "Survivor Theory" due to missing data, it initiates Sequential Testing:

Identify the Discriminator: Find the metric that differentiates the two best surviving theories (e.g., "Cortisol" differentiates 'Overtraining' from 'Insulin Resistance').
Request Input: Ask the user (via Chat) or request a Lab/Wearable sync.
Re-Hydrate: Update the Health State and trigger a new Popper Loop.

8. Epistemic Humility (The IDK Protocol)

A critical failure of current AI is "hallucination under pressure." Our engine prioritizes Epistemic Humility.

The "I Don't Know" (IDK) Trigger: If the Synthesizer identifies that all theories have been refuted, or if surviving theories are equally "easy-to-vary," the system must explicitly admit uncertainty.
Uncertainty Path: Instead of a "Diagnosis," the system renders an Uncertainty Artifact:
- "Current knowledge is insufficient to distinguish between [Theory A] and [Theory B]."
- "The primary bottleneck is the lack of [Specific Data Point]."
- "Action: Safest-case posture until [Metric] is updated."

10. Visual Grammar: The Glass Box Reasoning Trace

The user sees a "Calm Artifact," but can expand it to see the "Glass Box" reasoning. We utilize three research-backed explanation formats:

10.1 Rationale-Based Explanations

Explains why the system acted.

"We asked about your night sweats because it helps us rule out [Condition X]."

10.2 Feature-Based Summaries

Explains which data mattered most.

"Your High Glucose, Low Sleep, and Elevated Cortisol were the 3 most critical factors in this assessment."

10.3 Example-Based (Synthetic Vignettes)

Explains by comparison.

"Your situation is similar to a typical 'Overtraining' case where performance drops despite high effort."

11. Outcome Feedback Loop (Closed-Loop Learning)

To ensure the engine improves over time, we integrate the Protocol Engine (The Muscle) outcomes back into The Brain.

Protocol Result: The Muscle reports if an intervention succeeded (e.g., "Weight loss achieved").
Auto-Refutation: If a protocol fails despite high adherence (e.g., "Glucose remains high after 30 days of Low-Carb"), the Brain automatically triggers a Refutation of the Original Theory.
Re-Assessment: The engine initiates a new Popper Loop to find an alternative explanation that account for the failure.

12. Pragmatic Evaluation (Benchmarking)

To ensure the engine delivers "Senior Dev" quality without research-level overhead, we track three pragmatic metrics:

13. North Star

"We don't guess what's likely; we find what hasn't been proven wrong yet."

This ensures that the Regain Health provides explanatory power that is auditable, safe, and truly personalized to the individual's biological reality.