FOR · RAG DEVELOPERS

FHIR R4 built for retrieval.

Pre-resolved references, flat JSON, retrieval-optimized Markdown at /md. p99 under 300 ms. Citation-ready meta.source on every resource.

Get API access →Read capability statement

52% fewer tokens via /md · pre-resolved refs · citation-ready provenance

The problem · FHIR + RAG

FHIR JSON is reference-heavy. The `/md` endpoint fixes that.

Nested references

A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.

Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.

Token bloat

Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.

Fonteum: The /md endpoint serializes the same resource as structured Markdown — same clinical data, 52% fewer tokens on average.

Missing citations

When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.

Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.

Token efficiency · JSON vs /md

The same clinical data. Half the tokens.

Toggle between the standard FHIR JSON response and the ?_format=md Markdown serialization. Same provenance, same clinical data, fewer tokens in your context window.

58% more tokens than /md

{
  "resourceType": "Practitioner",
  "id": "prac-1003894328",
  "meta": {
    "tag": [
      { "system": "fonteum:provenance", "code": "cms-nppes" },
      { "system": "fonteum:last-checked", "code": "2026-05-24" }
    ]
  },
  "identifier": [
    { "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
  ],
  "name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
  "address": [
    {
      "use": "work",
      "line": ["400 Park Ave"],
      "city": "New York",
      "state": "NY",
      "postalCode": "10022"
    }
  ],
  "qualification": [
    {
      "code": {
        "coding": [
          {
            "system": "http://nucc.org/provider-taxonomy",
            "code": "207RC0000X",
            "display": "Cardiovascular Disease"
          }
        ]
      }
    }
  ]
}

LangChain · integration walkthrough

Retrieve providers with LangChain.

The FHIR retriever accepts natural-language queries and translates them to FHIR search parameters. Results come back as LangChain Document objects with metadata.source pre-populated from the Fonteum provenance block.

from langchain.retrievers import FHIRRetriever
retriever = FHIRRetriever(
    base_url="https://fonteum.com/api/fhir/r4",
    api_key="$FONTEUM_API_KEY",
    resource_type="Practitioner"
)
docs = retriever.get_relevant_documents("cardiologist New York")

Each returned Document carries metadata.source (CMS federal registry), metadata.last_checked, and metadata.npi for citation generation.

LlamaIndex · retriever

Index provider data with LlamaIndex.

Use the FHIR reader with use_markdown_endpoint=True to pull Markdown-serialized resources directly into a LlamaIndex VectorStoreIndex. Each document carries provenance metadata for downstream citation generation.

from llama_index.core import VectorStoreIndex
from llama_index.readers.fhir import FHIRReader

reader = FHIRReader(
    base_url="https://fonteum.com/api/fhir/r4",
    api_key="$FONTEUM_API_KEY",
    resource_types=["Practitioner", "Organization"],
    use_markdown_endpoint=True,  # fetch /md for 52% fewer tokens
)

documents = reader.load_data(
    search_params={"address-state": "NY", "_count": 50}
)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("cardiologists accepting Medicare in Manhattan")

Token counts · by resource type

Average token counts per resource.

Resource	JSON tokens	/md tokens	Reduction
Practitioner	312	148	−53%
Organization	298	131	−56%
Location	187	89	−52%
PractitionerRole	224	104	−54%
HealthcareService	341	162	−52%

Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.

Latency benchmarks · under load

Sub-300 ms at p99.

Percentile	JSON endpoint	/md endpoint
p50	38 ms	22 ms
p95	142 ms	68 ms
p99	290 ms	138 ms
p99.9	480 ms	210 ms

Measured at the Vercel edge with 50 concurrent connections. Latency is gateway-to-response-complete. Source data is served from a warm CDN cache; cold-cache adds ~80 ms.

Get API access →

Methodology · Corrections log · Editorial policy

FHIR JSON is reference-heavy. The /md endpoint fixes that.

Nested references

A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.

Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.

Token bloat

Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.

Fonteum: The /md endpoint serializes the same resource as structured Markdown — same clinical data, 52% fewer tokens on average.

Missing citations

When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.

Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.

The same clinical data. Half the tokens.

Toggle between the standard FHIR JSON response and the ?_format=md Markdown serialization. Same provenance, same clinical data, fewer tokens in your context window.

58% more tokens than /md

{
  "resourceType": "Practitioner",
  "id": "prac-1003894328",
  "meta": {
    "tag": [
      { "system": "fonteum:provenance", "code": "cms-nppes" },
      { "system": "fonteum:last-checked", "code": "2026-05-24" }
    ]
  },
  "identifier": [
    { "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
  ],
  "name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
  "address": [
    {
      "use": "work",
      "line": ["400 Park Ave"],
      "city": "New York",
      "state": "NY",
      "postalCode": "10022"
    }
  ],
  "qualification": [
    {
      "code": {
        "coding": [
          {
            "system": "http://nucc.org/provider-taxonomy",
            "code": "207RC0000X",
            "display": "Cardiovascular Disease"
          }
        ]
      }
    }
  ]
}

Retrieve providers with LangChain.

from langchain.retrievers import FHIRRetriever retriever = FHIRRetriever( base_url="https://fonteum.com/api/fhir/r4", api_key="$FONTEUM_API_KEY", resource_type="Practitioner" ) docs = retriever.get_relevant_documents("cardiologist New York")

Each returned Document carries metadata.source (CMS federal registry), metadata.last_checked, and metadata.npi for citation generation.

Index provider data with LlamaIndex.

from llama_index.core import VectorStoreIndex from llama_index.readers.fhir import FHIRReader reader = FHIRReader( base_url="https://fonteum.com/api/fhir/r4", api_key="$FONTEUM_API_KEY", resource_types=["Practitioner", "Organization"], use_markdown_endpoint=True, # fetch /md for 52% fewer tokens ) documents = reader.load_data( search_params={"address-state": "NY", "_count": 50} ) index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() response = query_engine.query("cardiologists accepting Medicare in Manhattan")

Average token counts per resource.

Resource	JSON tokens	/md tokens	Reduction
Practitioner	312	148	−53%
Organization	298	131	−56%
Location	187	89	−52%
PractitionerRole	224	104	−54%
HealthcareService	341	162	−52%

Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.

Percentile

JSON endpoint

/md endpoint

p50

38 ms

22 ms

p95

142 ms

68 ms

p99

290 ms

138 ms

p99.9

480 ms

210 ms

FHIR R4 built for retrieval.

FHIR JSON is reference-heavy. The /md endpoint fixes that.

The same clinical data. Half the tokens.

Retrieve providers with LangChain.

Index provider data with LlamaIndex.

Average token counts per resource.

Sub-300 ms at p99.

Compliance posture

FHIR R4 built for retrieval.

FHIR JSON is reference-heavy. The /md endpoint fixes that.

The same clinical data. Half the tokens.

Retrieve providers with LangChain.

Index provider data with LlamaIndex.

Average token counts per resource.

Sub-300 ms at p99.

Compliance posture

FHIR JSON is reference-heavy. The `/md` endpoint fixes that.

FHIR JSON is reference-heavy. The `/md` endpoint fixes that.