Popperian Assessment Engine (The Brain)
This document defines the technical architecture of the Assessment Engine, the core reasoning layer of the Regain Health. It implements Critical Rationalism (Popper/Deutsch) to bridge the gap between probabilistic LLM behavior and rigorous medical-grade assessment.
1. The Epistemological Problem
Standard AI health tools use Inductive Reasoning: matching symptoms to patterns in a training set to output a probability (e.g., "80% chance of X").
The Failures of Induction:
- Correlation vs. Causation: Confuses "A usually follows B" with "B causes A."
- The Black Swan: Fails on rare cases that don't match common patterns.
- Easy-to-Vary: Explanations are "mushy"—you can change details without the theory breaking.
The Popperian Solution: Critical Rationalism We do not look for confirmation (evidence that supports a theory); we look for refutation (evidence that kills a theory). A theory is only "good" if it survives rigorous attempts to destroy it.
2. Advanced Epistemological Grounding
Our implementation is grounded in the "Bleeding Edge" of AI reasoning research:
- Cognitive Emulation (CoE): Unlike "Simulators" that guess text, our engine emulates human scientific reasoning. It starts from a first-principle "World Model" and only makes claims that the model can justify through refutation.
- Group Relative Policy Optimization (GRPO): Inspired by DeepSeek, we use self-verification logic where agents are rewarded for finding their own errors or successfully refuting a flawed conjecture.
- Sequential Falsification (POMDP): We treat health assessment as a Partially Observable Markov Decision Process. The goal is not to "find the answer" but to reduce uncertainty by iteratively designing experiments (tests) to kill incorrect theories.
3. Multi-Agent "ArgMed" Architecture
We simulate a world-class medical team using four specialized AI agents within a Stateful Graph (LangGraph). This ensures the reasoning is iterative, auditable, and follows the Generator-Verifier-Reasoner (ArgMed) pattern.
flowchart TD
subgraph input ["Input Layer"]
HS["Health State vN"] --> IntakeAgent["Intake Agent"]
end
subgraph debate ["Stateful Popper Loop (ArgMed Pattern)"]
IntakeAgent --> PS["Problem Situation"]
PS --> Gen["The Conjecturer (Generator)"]
Gen --> TT["Tentative Theories + Causal Links"]
TT --> Ver["The Critic (Verifier)"]
Ver -->|"Symbolic Search"| HS
Ver -->|"Evidence Lookup"| EE["Evidence Engine"]
Ver -->|"Self-Argumentation"| TT
Ver -->|"Refutation / HTV Score"| Rea["The Synthesizer (Reasoner)"]
Rea -->|"Inadequate Explanation"| PS
Rea -->|"Discriminator Needed"| Seq["Sequential Testing"]
end
subgraph output ["Output Layer"]
Rea -->|"Survivor Theory"| Final["Glass Box Trace"]
Seq -->|"Data Request"| User["User / Wearable Sync"]
end
3.1 The Intake Agent (The Context Builder)
- Objective: Build the "Problem Situation."
- Logic: Maps raw Health State data (vitals, logs, labs) into a structured timeline.
- Output: A context object containing all unexplained phenomena.
3.2 The Conjecturer (The Generator)
- Objective: Bold, creative hypothesis generation with Causal Grounding.
- Instruction: "What are all the biological theories that could explain this Problem Situation? For each, propose a causal mechanism (e.g., 'X causes Y because Z')."
- Output: A list of Tentative Theories (TT) with attached causal links.
3.3 The Critic (The Verifier)
- Objective: Systematic destruction of theories via Self-Argumentation.
- Logic: Uses Negative Search + Symbolic Guardrails to kill theories. It actively searches for "Counter-Arguments" in the Health State that refute the Generator's causal links.
- Output: Refutation notes and Hard-to-Vary (HTV) scores.
3.4 The Synthesizer (The Reasoner)
- Objective: Selection via Argument Reconciliation.
- Logic: Evaluates the survivor pool using a simplified Argument Graph. It selects the theory with the fewest un-reconciled contradictions. If uncertainty is high, it triggers the IDK Protocol.
4. Hard-to-Vary (HTV) Scoring
A theory's quality is measured by its HTV Score, not its probability.
| Criterion | Definition | Scoring | | :--- | :--- | :--- | | Interdependence | If you change one part of the theory, does the whole thing collapse? | High = Good | | Specificity | Does it make precise, testable predictions about the Health State? | High = Good | | Parsimony | Does it explain the data without adding "arbitrary" assumptions? | High = Good | | Falsifiability | Is there a clear data point that would prove it wrong? | Required |
5. The Refutation Protocol (Technical Implementation)
To prevent LLM "hallucination of confirmation," the Verifier agent uses a Negative Search pattern combined with Schema-Driven Logic. This replaces complex theorem proving with standard engineering patterns (Pydantic/Zod).
5.1 Schema-Driven Guardrails
We "sandwich" the generative LLM between deterministic validation schemas. Every conjecture must be validated against a Medical Consistency Schema.
- Translation: The Verifier translates a conjecture into a structured JSON object.
- Schema Validation: The system uses Pydantic (Python) or Zod (TypeScript) to enforce consistency.
- Example: A
DiabetesConjectureschema requires aglucose_level > 126orHbA1c > 6.5.
- Example: A
- Hard Refutation: If the Health State data contradicts the schema requirements (e.g.,
HbA1cis5.4), the Pydantic validator throws an error, and the theory is killed instantly.
# Practical implementation for Senior Devs
class DiabetesConjecture(BaseModel):
logic: str = "Fasten glucose > 126"
@validator('logic')
def validate_against_health_state(cls, v, values):
# Deterministic check against real DB
if current_hba1c < 6.5:
raise ValueError("Refuted: HbA1c is within normal range.")
return v
6. Causal Links & Counterfactuals
To move beyond correlative pattern matching, the engine implements simplified causal inference.
6.1 Causal Predictions
Every conjecture must include a Falsifiable Prediction:
- "If Theory T is true, then Fact F must appear in the labs within X days."
- Example: "If this fatigue is caused by Iron Deficiency, then a Ferritin test must return < 30 ng/mL."
6.2 The Counterfactual Check
The Verifier agent performs a "What-If" analysis to test the robustness of a theory:
- Scenario: "Would the patient still be tired if we removed the 'Overtraining' factor?"
- Logic: If the symptoms can be fully explained by removing a single lifestyle factor (Counterfactual), the engine prioritizes that factor as the "Hardest-to-Vary" explanation.
7. Implementation Frameworks
To move beyond simple prompts to a robust reasoning engine, we utilize state-of-the-art agentic frameworks:
- LangGraph (State Management): Used for the Stateful Popper Loop. It manages the cyclic transition between Conjecture, Refutation, and Synthesis, ensuring the process is non-linear and can backtrack when new data is required.
- CrewAI / AutoGen (Epistemological Roles): Used to define the adversarial personas (House vs. Scientist). These frameworks enforce the "Fixed Epistemological Roles" by providing agents with specific "mental" constraints.
- LeanDojo (Formal Verification): For high-stakes clinical protocols, we use neuro-symbolic tools to convert medical logic into formal proofs, ensuring recommendations are mathematically consistent with the user's constraints.
7. Sequential Testing (The Bridge to Protocol)
If the Synthesizer cannot reach a "Survivor Theory" due to missing data, it initiates Sequential Testing:
- Identify the Discriminator: Find the metric that differentiates the two best surviving theories (e.g., "Cortisol" differentiates 'Overtraining' from 'Insulin Resistance').
- Request Input: Ask the user (via Chat) or request a Lab/Wearable sync.
- Re-Hydrate: Update the Health State and trigger a new Popper Loop.
8. Epistemic Humility (The IDK Protocol)
A critical failure of current AI is "hallucination under pressure." Our engine prioritizes Epistemic Humility.
- The "I Don't Know" (IDK) Trigger: If the Synthesizer identifies that all theories have been refuted, or if surviving theories are equally "easy-to-vary," the system must explicitly admit uncertainty.
- Uncertainty Path: Instead of a "Diagnosis," the system renders an Uncertainty Artifact:
- "Current knowledge is insufficient to distinguish between [Theory A] and [Theory B]."
- "The primary bottleneck is the lack of [Specific Data Point]."
- "Action: Safest-case posture until [Metric] is updated."
10. Visual Grammar: The Glass Box Reasoning Trace
The user sees a "Calm Artifact," but can expand it to see the "Glass Box" reasoning. We utilize three research-backed explanation formats:
10.1 Rationale-Based Explanations
Explains why the system acted.
- "We asked about your night sweats because it helps us rule out [Condition X]."
10.2 Feature-Based Summaries
Explains which data mattered most.
- "Your High Glucose, Low Sleep, and Elevated Cortisol were the 3 most critical factors in this assessment."
10.3 Example-Based (Synthetic Vignettes)
Explains by comparison.
- "Your situation is similar to a typical 'Overtraining' case where performance drops despite high effort."
11. Outcome Feedback Loop (Closed-Loop Learning)
To ensure the engine improves over time, we integrate the Protocol Engine (The Muscle) outcomes back into The Brain.
- Protocol Result: The Muscle reports if an intervention succeeded (e.g., "Weight loss achieved").
- Auto-Refutation: If a protocol fails despite high adherence (e.g., "Glucose remains high after 30 days of Low-Carb"), the Brain automatically triggers a Refutation of the Original Theory.
- Re-Assessment: The engine initiates a new Popper Loop to find an alternative explanation that account for the failure.
12. Pragmatic Evaluation (Benchmarking)
To ensure the engine delivers "Senior Dev" quality without research-level overhead, we track three pragmatic metrics:
| Metric | Definition | Goal | | :--- | :--- | :--- | | Top-K Accuracy | Is the ground-truth condition in the Synthesizer's top 3 survivors? | > 90% | | Refutation Depth | Number of incorrect theories killed per session. | Avg > 5 | | Safety Escalation | % of "Red Flag" scenarios correctly identified and escalated. | 100% |
13. North Star
"We don't guess what's likely; we find what hasn't been proven wrong yet."
This ensures that the Regain Health provides explanatory power that is auditable, safe, and truly personalized to the individual's biological reality.