Popperian Assessment Engine (The Brain)

Popperian Assessment Engine (The Brain)

This document defines the technical architecture of the Assessment Engine, the core reasoning layer of the Regain Health. It implements Critical Rationalism (Popper/Deutsch) to bridge the gap between probabilistic LLM behavior and rigorous medical-grade assessment.


1. The Epistemological Problem

Standard AI health tools use Inductive Reasoning: matching symptoms to patterns in a training set to output a probability (e.g., "80% chance of X").

The Failures of Induction:

  1. Correlation vs. Causation: Confuses "A usually follows B" with "B causes A."
  2. The Black Swan: Fails on rare cases that don't match common patterns.
  3. Easy-to-Vary: Explanations are "mushy"—you can change details without the theory breaking.

The Popperian Solution: Critical Rationalism We do not look for confirmation (evidence that supports a theory); we look for refutation (evidence that kills a theory). A theory is only "good" if it survives rigorous attempts to destroy it.


2. Advanced Epistemological Grounding

Our implementation is grounded in the "Bleeding Edge" of AI reasoning research:

  • Cognitive Emulation (CoE): Unlike "Simulators" that guess text, our engine emulates human scientific reasoning. It starts from a first-principle "World Model" and only makes claims that the model can justify through refutation.
  • Group Relative Policy Optimization (GRPO): Inspired by DeepSeek, we use self-verification logic where agents are rewarded for finding their own errors or successfully refuting a flawed conjecture.
  • Sequential Falsification (POMDP): We treat health assessment as a Partially Observable Markov Decision Process. The goal is not to "find the answer" but to reduce uncertainty by iteratively designing experiments (tests) to kill incorrect theories.

3. Multi-Agent "ArgMed" Architecture

We simulate a world-class medical team using four specialized AI agents within a Stateful Graph (LangGraph). This ensures the reasoning is iterative, auditable, and follows the Generator-Verifier-Reasoner (ArgMed) pattern.

flowchart TD
    subgraph input ["Input Layer"]
        HS["Health State vN"] --> IntakeAgent["Intake Agent"]
    end

    subgraph debate ["Stateful Popper Loop (ArgMed Pattern)"]
        IntakeAgent --> PS["Problem Situation"]
        PS --> Gen["The Conjecturer (Generator)"]
        Gen --> TT["Tentative Theories + Causal Links"]
        
        TT --> Ver["The Critic (Verifier)"]
        Ver -->|"Symbolic Search"| HS
        Ver -->|"Evidence Lookup"| EE["Evidence Engine"]
        Ver -->|"Self-Argumentation"| TT
        
        Ver -->|"Refutation / HTV Score"| Rea["The Synthesizer (Reasoner)"]
        
        Rea -->|"Inadequate Explanation"| PS
        Rea -->|"Discriminator Needed"| Seq["Sequential Testing"]
    end

    subgraph output ["Output Layer"]
        Rea -->|"Survivor Theory"| Final["Glass Box Trace"]
        Seq -->|"Data Request"| User["User / Wearable Sync"]
    end

3.1 The Intake Agent (The Context Builder)

  • Objective: Build the "Problem Situation."
  • Logic: Maps raw Health State data (vitals, logs, labs) into a structured timeline.
  • Output: A context object containing all unexplained phenomena.

3.2 The Conjecturer (The Generator)

  • Objective: Bold, creative hypothesis generation with Causal Grounding.
  • Instruction: "What are all the biological theories that could explain this Problem Situation? For each, propose a causal mechanism (e.g., 'X causes Y because Z')."
  • Output: A list of Tentative Theories (TT) with attached causal links.

3.3 The Critic (The Verifier)

  • Objective: Systematic destruction of theories via Self-Argumentation.
  • Logic: Uses Negative Search + Symbolic Guardrails to kill theories. It actively searches for "Counter-Arguments" in the Health State that refute the Generator's causal links.
  • Output: Refutation notes and Hard-to-Vary (HTV) scores.

3.4 The Synthesizer (The Reasoner)

  • Objective: Selection via Argument Reconciliation.
  • Logic: Evaluates the survivor pool using a simplified Argument Graph. It selects the theory with the fewest un-reconciled contradictions. If uncertainty is high, it triggers the IDK Protocol.

4. Hard-to-Vary (HTV) Scoring

A theory's quality is measured by its HTV Score, not its probability.

| Criterion | Definition | Scoring | | :--- | :--- | :--- | | Interdependence | If you change one part of the theory, does the whole thing collapse? | High = Good | | Specificity | Does it make precise, testable predictions about the Health State? | High = Good | | Parsimony | Does it explain the data without adding "arbitrary" assumptions? | High = Good | | Falsifiability | Is there a clear data point that would prove it wrong? | Required |


5. The Refutation Protocol (Technical Implementation)

To prevent LLM "hallucination of confirmation," the Verifier agent uses a Negative Search pattern combined with Schema-Driven Logic. This replaces complex theorem proving with standard engineering patterns (Pydantic/Zod).

5.1 Schema-Driven Guardrails

We "sandwich" the generative LLM between deterministic validation schemas. Every conjecture must be validated against a Medical Consistency Schema.

  1. Translation: The Verifier translates a conjecture into a structured JSON object.
  2. Schema Validation: The system uses Pydantic (Python) or Zod (TypeScript) to enforce consistency.
    • Example: A DiabetesConjecture schema requires a glucose_level > 126 or HbA1c > 6.5.
  3. Hard Refutation: If the Health State data contradicts the schema requirements (e.g., HbA1c is 5.4), the Pydantic validator throws an error, and the theory is killed instantly.
# Practical implementation for Senior Devs
class DiabetesConjecture(BaseModel):
    logic: str = "Fasten glucose > 126"
    
    @validator('logic')
    def validate_against_health_state(cls, v, values):
        # Deterministic check against real DB
        if current_hba1c < 6.5:
            raise ValueError("Refuted: HbA1c is within normal range.")
        return v

6. Causal Links & Counterfactuals

To move beyond correlative pattern matching, the engine implements simplified causal inference.

6.1 Causal Predictions

Every conjecture must include a Falsifiable Prediction:

  • "If Theory T is true, then Fact F must appear in the labs within X days."
  • Example: "If this fatigue is caused by Iron Deficiency, then a Ferritin test must return < 30 ng/mL."

6.2 The Counterfactual Check

The Verifier agent performs a "What-If" analysis to test the robustness of a theory:

  • Scenario: "Would the patient still be tired if we removed the 'Overtraining' factor?"
  • Logic: If the symptoms can be fully explained by removing a single lifestyle factor (Counterfactual), the engine prioritizes that factor as the "Hardest-to-Vary" explanation.

7. Implementation Frameworks

To move beyond simple prompts to a robust reasoning engine, we utilize state-of-the-art agentic frameworks:

  • LangGraph (State Management): Used for the Stateful Popper Loop. It manages the cyclic transition between Conjecture, Refutation, and Synthesis, ensuring the process is non-linear and can backtrack when new data is required.
  • CrewAI / AutoGen (Epistemological Roles): Used to define the adversarial personas (House vs. Scientist). These frameworks enforce the "Fixed Epistemological Roles" by providing agents with specific "mental" constraints.
  • LeanDojo (Formal Verification): For high-stakes clinical protocols, we use neuro-symbolic tools to convert medical logic into formal proofs, ensuring recommendations are mathematically consistent with the user's constraints.

7. Sequential Testing (The Bridge to Protocol)

If the Synthesizer cannot reach a "Survivor Theory" due to missing data, it initiates Sequential Testing:

  1. Identify the Discriminator: Find the metric that differentiates the two best surviving theories (e.g., "Cortisol" differentiates 'Overtraining' from 'Insulin Resistance').
  2. Request Input: Ask the user (via Chat) or request a Lab/Wearable sync.
  3. Re-Hydrate: Update the Health State and trigger a new Popper Loop.

8. Epistemic Humility (The IDK Protocol)

A critical failure of current AI is "hallucination under pressure." Our engine prioritizes Epistemic Humility.

  • The "I Don't Know" (IDK) Trigger: If the Synthesizer identifies that all theories have been refuted, or if surviving theories are equally "easy-to-vary," the system must explicitly admit uncertainty.
  • Uncertainty Path: Instead of a "Diagnosis," the system renders an Uncertainty Artifact:
    • "Current knowledge is insufficient to distinguish between [Theory A] and [Theory B]."
    • "The primary bottleneck is the lack of [Specific Data Point]."
    • "Action: Safest-case posture until [Metric] is updated."

10. Visual Grammar: The Glass Box Reasoning Trace

The user sees a "Calm Artifact," but can expand it to see the "Glass Box" reasoning. We utilize three research-backed explanation formats:

10.1 Rationale-Based Explanations

Explains why the system acted.

  • "We asked about your night sweats because it helps us rule out [Condition X]."

10.2 Feature-Based Summaries

Explains which data mattered most.

  • "Your High Glucose, Low Sleep, and Elevated Cortisol were the 3 most critical factors in this assessment."

10.3 Example-Based (Synthetic Vignettes)

Explains by comparison.

  • "Your situation is similar to a typical 'Overtraining' case where performance drops despite high effort."

11. Outcome Feedback Loop (Closed-Loop Learning)

To ensure the engine improves over time, we integrate the Protocol Engine (The Muscle) outcomes back into The Brain.

  1. Protocol Result: The Muscle reports if an intervention succeeded (e.g., "Weight loss achieved").
  2. Auto-Refutation: If a protocol fails despite high adherence (e.g., "Glucose remains high after 30 days of Low-Carb"), the Brain automatically triggers a Refutation of the Original Theory.
  3. Re-Assessment: The engine initiates a new Popper Loop to find an alternative explanation that account for the failure.

12. Pragmatic Evaluation (Benchmarking)

To ensure the engine delivers "Senior Dev" quality without research-level overhead, we track three pragmatic metrics:

| Metric | Definition | Goal | | :--- | :--- | :--- | | Top-K Accuracy | Is the ground-truth condition in the Synthesizer's top 3 survivors? | > 90% | | Refutation Depth | Number of incorrect theories killed per session. | Avg > 5 | | Safety Escalation | % of "Red Flag" scenarios correctly identified and escalated. | 100% |


13. North Star

"We don't guess what's likely; we find what hasn't been proven wrong yet."

This ensures that the Regain Health provides explanatory power that is auditable, safe, and truly personalized to the individual's biological reality.