FOR · RAG DEVELOPERS
FHIR R4 built for retrieval.
Pre-resolved references, flat JSON, retrieval-optimized Markdown at /md. p99 under 300 ms. Citation-ready meta.source on every resource.
52% fewer tokens via /md · pre-resolved refs · citation-ready provenance
The problem · FHIR + RAG
FHIR JSON is reference-heavy. The /md endpoint fixes that.
Nested references
A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.
Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.
Token bloat
Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.
Fonteum: The /md endpoint serializes the same resource as structured Markdown — same clinical data, 52% fewer tokens on average.
Missing citations
When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.
Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.
Token efficiency · JSON vs /md
The same clinical data. Half the tokens.
Toggle between the standard FHIR JSON response and the ?_format=md Markdown serialization. Same provenance, same clinical data, fewer tokens in your context window.
{
"resourceType": "Practitioner",
"id": "prac-1003894328",
"meta": {
"tag": [
{ "system": "fonteum:provenance", "code": "cms-nppes" },
{ "system": "fonteum:last-checked", "code": "2026-05-24" }
]
},
"identifier": [
{ "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
],
"name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
"address": [
{
"use": "work",
"line": ["400 Park Ave"],
"city": "New York",
"state": "NY",
"postalCode": "10022"
}
],
"qualification": [
{
"code": {
"coding": [
{
"system": "http://nucc.org/provider-taxonomy",
"code": "207RC0000X",
"display": "Cardiovascular Disease"
}
]
}
}
]
}LangChain · integration walkthrough
Retrieve providers with LangChain.
The FHIR retriever accepts natural-language queries and translates them to FHIR search parameters. Results come back as LangChain Document objects with metadata.source pre-populated from the Fonteum provenance block.
from langchain.retrievers import FHIRRetriever
retriever = FHIRRetriever(
base_url="https://fonteum.com/api/fhir/r4",
api_key="$FONTEUM_API_KEY",
resource_type="Practitioner"
)
docs = retriever.get_relevant_documents("cardiologist New York")Each returned Document carries metadata.source (CMS federal registry), metadata.last_checked, and metadata.npi for citation generation.
LlamaIndex · retriever
Index provider data with LlamaIndex.
Use the FHIR reader with use_markdown_endpoint=True to pull Markdown-serialized resources directly into a LlamaIndex VectorStoreIndex. Each document carries provenance metadata for downstream citation generation.
from llama_index.core import VectorStoreIndex
from llama_index.readers.fhir import FHIRReader
reader = FHIRReader(
base_url="https://fonteum.com/api/fhir/r4",
api_key="$FONTEUM_API_KEY",
resource_types=["Practitioner", "Organization"],
use_markdown_endpoint=True, # fetch /md for 52% fewer tokens
)
documents = reader.load_data(
search_params={"address-state": "NY", "_count": 50}
)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("cardiologists accepting Medicare in Manhattan")Token counts · by resource type
Average token counts per resource.
| Resource | JSON tokens | /md tokens | Reduction |
|---|---|---|---|
| Practitioner | 312 | 148 | −53% |
| Organization | 298 | 131 | −56% |
| Location | 187 | 89 | −52% |
| PractitionerRole | 224 | 104 | −54% |
| HealthcareService | 341 | 162 | −52% |
Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.
Latency benchmarks · under load
Sub-300 ms at p99.
| Percentile | JSON endpoint | /md endpoint |
|---|---|---|
| p50 | 38 ms | 22 ms |
| p95 | 142 ms | 68 ms |
| p99 | 290 ms | 138 ms |
| p99.9 | 480 ms | 210 ms |
Measured at the Vercel edge with 50 concurrent connections. Latency is gateway-to-response-complete. Source data is served from a warm CDN cache; cold-cache adds ~80 ms.