Evidence Engine (Knowledge Databases)
Scientific Archive + Expert Vaults + User Evidence Vault
The Evidence Engine makes the system auditable and teachable: retrieve → cite → explain.
It exists to solve one core constraint: the app must be able to explain itself with traceable sources (“Glass Box”), while still personalizing to each user’s Health State without leaking data or violating content licensing.
1) Purpose
The Evidence Engine powers three user-facing outcomes:
-
Explainability
- Every meaningful recommendation can be traced to underlying inputs and/or external evidence.
- This supports the citation/audit requirements in 05-Compliance-Safety-Privacy.md.
-
Education
- We do not teach rules; we teach reasoning.
- Evidence is the substrate for gentle, continuous education (see 02-Core-Product-System.md.
-
Attribution + trust
- Users can verify, click through, and audit.
- Experts get proper attribution, links, and (when applicable) timestamps for media.
This is not a “content library.” It is a decision-support substrate: retrieve → cite → explain.
2) What the Knowledge Databases are
We maintain four canonical corpora plus one user-specific “lens”:
2.1 Scientific Archive (lifestyle research)
A curated archive of licensed, machine-readable scientific sources relevant to lifestyle medicine and the user’s conditions.
- Prefer open-access full text when reuse permits.
- Otherwise store metadata/abstract and link out.
- Track evidence-quality metadata (study type, population, outcomes) so the UI can communicate what kind of evidence this is.
2.2 Expert Vaults (per-expert collections)
A separate corpus per expert (e.g., Huberman, Greger, Patrick), containing:
- official website content (where allowed)
- podcast/video transcripts (where allowed)
- indexing by topics, episodes, and claims
2.3 User Evidence Vault (uploads)
Users can upload documents they have lawful access to (including paywalled PDFs). These uploads are:
- private to the user by default
- covered by privacy rights (export + kill switch)
- included in retrieval only when relevant and authorized
This is part of the "Vault" posture in 05-Compliance-Safety-Privacy.md.
2.4 Health-State Lens (personalized subdatabase)
We do not duplicate the entire archive per user. Instead we create a logical lens per user that selects eligible sources at retrieval time.
Lens inputs:
HealthState_vNsnapshot (versioned)- user goals/conditions
- explicit exclusions (blocked topics, blocked experts, incognito constraints)
Lens outputs:
- eligible collections for this user/query
- ranked Evidence Pack (chunks + citations) for the generator
2.5 Standard Treatment Database (guideline library)
A curated, US-first database of standard-of-care clinical guidance that we can safely reference to inform users about widely accepted options (medication classes, typical procedures, monitoring, and escalation signals) with attribution.
What it is:
- guideline-derived “standard options,” sourced from official/reputable bodies (see Governance)
- structured pathways rendered as calm user-facing artifacts in chat and on condition pages
- a hallucination-reduction tool that prioritizes known citable guidance over free-form generation
What it is not:
- an order set or prescriptive care plan
- dosing/titration/contraindication resolution
- personalized medication advice
- instructions to start/stop/change medications (see 05-Compliance-Safety-Privacy.md
Relationship to other corpora:
- Scientific Archive: primary literature (studies)
- Standard Treatment DB: guideline-level “what clinicians commonly do”
- Expert Vaults: “what a person says”
Disagreements must be shown as disagreements, not silently blended.
How it personalizes:
- ICD-10 diagnoses (when present) are treated as starting hypotheses
- eligibility and relevance route through
HealthState_vN(biomarkers, meds, constraints, red flags, preferences)
3) How it integrates into the product
3.1 Chat interface (primary)
Evidence is surfaced as:
- inline citations (small markers)
- an expandable “Evidence drawer” (paper metadata, expert quote, media link)
- a “What data we used” panel (Health State inputs + assumptions)
This matches the product requirement that the user sees a calm artifact, not internal debate (see 02-Core-Product-System.md.
3.2 Condition pages (secondary / visual mode)
Condition pages are browsable representations of:
- the condition (evidence briefs)
- what experts say (perspectives)
- what the user’s data suggests (Health State alignment)
This is not “a Wikipedia page.” It is a living, personalized evidence map tied to HealthState_vN.
3.3 Labs module (Infrastructure)
When Labs Module is active, evidence is used across zones:
- Zone 1 (Clinical): careful explanations and reputable sources
- Zone 2 (Optimization): lifestyle optimization guidance supported by evidence and expert perspectives
Two-zone separation remains mandatory (see 04-Labs-and-LifeSense.md and 05-Compliance-Safety-Privacy.md.
4) Retrieval-Augmented Generation (RAG) flow (conceptual)
flowchart TB
user["User"] --> query["QuestionOrConditionIntent"]
query --> router["HealthStateRouter"]
router --> retrieve["RetrieveAndRerank"]
retrieve --> pack["EvidencePack"]
pack --> synth["GenerateAnswerWithCitations"]
synth --> ui["ChatOrConditionPage"]
health["HealthState_vN"] --> router
vault["UserEvidenceVault"] --> retrieve
sci["ScientificArchive"] --> retrieve
expert["ExpertVaults"] --> retrieve
4.1 Evidence Pack (what the model sees)
The generator is not fed “the whole internet.” It is fed a structured pack:
- retrieved excerpts (chunks)
- source metadata and licensing flags
- canonical URLs
- for media: timestamps + episode identifiers
4.2 Citation binder (what the user sees)
Every non-trivial claim must map to:
- a Health State input (e.g., sleep metric), and/or
- an external citation (study, guideline, expert quote, transcript segment)
If neither exists, the system must either:
- ask for more data, or
- state it cannot support the claim safely
5) Attribution and referencing rules
Attribution is not optional. It is the trust substrate.
5.1 Scientific sources
Always provide:
- title, authors, year, journal (or preprint server)
- DOI/PMID/PMCID when available
- canonical link
5.2 Expert sources
Always provide:
- expert name
- canonical URL
- timestamped link when possible (audio/video)
- quote excerpts when using their wording directly
5.3 User uploads
By default:
- cite as “Your uploaded document” (user-facing)
- internally record a document fingerprint + retrieval timestamp for audit
Never expose a user’s upload as a source to another user.
6) Governance: what we ingest and why
The Evidence Engine is governed by an explicit allow-list (Allowed Sources).
6.1 Allowed Sources registry
For each source we record:
- what we are allowed to ingest
- how we are allowed to retrieve it (API vs manual)
- license model (per-document vs site-wide vs user-provided)
- attribution requirements
- takedown process
6.2 Robots / scraping posture
If accessed via crawling:
- respect robots rules
- throttle requests
- avoid systematic scraping unless explicitly allowed
6.3 Licensing posture
We never treat “publicly accessible” as “redistributable.”
- Many scientific sources are copyright-restricted even if you can read them.
- For experts, default posture is link and quote minimally unless reuse rights are explicit.
6.4 Standard Treatment DB: tiered sources (US-first)
The Standard Treatment DB must be grounded in official and reputable sources with explicit conflict handling.
Tier 1 (official public bodies; default when available):
Tier 1.5 (authoritative, but not “standard-of-care”):
- CMS/payers are authoritative for coding and coverage, not what is clinically best
- CMS ICD-10
Tier 2 (specialty societies; when Tier 1 is absent or insufficient):
- ADA (diabetes), AHA/ACC (cardiovascular), etc.
Conflict and disagreement policy:
- If Tier 1 and Tier 2 disagree: show both, label tiers, encourage clinician coordination.
- If multiple Tier 2 sources disagree: show variation, do not blend.
- When evidence is ambiguous: default to safest user-facing behavior (uncertainty + escalation) per 05-Compliance-Safety-Privacy.md.
7) Privacy, sovereignty, and auditability
Evidence retrieval is a form of data access and must be auditable.
7.1 Audit log (Glass Box)
Users must be able to see:
- what Health State data was accessed
- which evidence sources were retrieved
- when it happened
7.2 Kill switch and deletion semantics
When a user deletes their data:
- user uploads are deleted (storage + derived indexes)
- cached personal lens artifacts are purged
- aggregate non-user-specific corpora remain intact
7.3 Incognito mode (sensitive domains)
For sensitive topics, evidence can be used ephemerally:
- process in-session
- do not persist
- do not add to long-term retrieval indices
8) Safety and claims boundaries (how we communicate evidence)
Even with citations, we do not cross the product boundary:
- do not diagnose; frame as assessment/explanation
- do not instruct medication changes
- escalate on red-flag risk or uncertainty
- never paywall safety-critical guidance
Evidence strengthens trust, but does not change liability posture.
9) North Star
“Here’s what fits your data, here’s what we can’t rule out, here’s what to measure next — and here are the sources.”