FOR · AI AGENTS & LLM PIPELINES
Provenance-verified data for AI agents.
A machine-readable index of every Fonteum dataset, license, schema, and FHIR endpoint — structured for direct consumption by LLM pipelines.
13 datasets · 2,703,357 rows · CC BY 4.0 · EU AI Act Art. 53
Dataset schema · schema.org/Dataset
Machine-readable dataset declaration.
This schema.org Dataset block is embedded in every page and served at /.well-known/agent.json. Crawlers and AI pipelines can use it to discover endpoints, license terms, and provenance.
{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "Fonteum Federal Healthcare Data Infrastructure",
"url": "https://fonteum.com/for/ai-agents",
"version": "2026.05",
"dateModified": "2026-05-26",
"license": "https://creativecommons.org/licenses/by/4.0/",
"creator": {
"@type": "Organization",
"name": "Fonteum, Inc.",
"url": "https://fonteum.com"
},
"description": "13 federal healthcare datasets from CMS and HHS-OIG. 2,703,357 rows. Row-level provenance on every field.",
"isBasedOn": [
"https://www.cms.gov/",
"https://oig.hhs.gov/"
],
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "application/fhir+json",
"contentUrl": "https://fonteum.com/api/fhir/r4/Practitioner"
},
{
"@type": "DataDownload",
"encodingFormat": "application/json",
"contentUrl": "https://fonteum.com/api/freshness"
}
]
}Provenance map · 13 datasets
Every dataset, source, and row count.
| Dataset | Federal source | Rows |
|---|---|---|
| CMS Practitioner Registry (NPI) | CMS NPI Registry | 6,943,117 |
| CMS PECOS Enrollment | CMS PECOS PPEF | 1,218,442 |
| CMS Care Compare — Nursing Homes | CMS Care Compare NH | 14,635 |
| CMS Care Compare — Home Health | CMS Care Compare HH | 12,392 |
| CMS Care Compare — Hospice | CMS Care Compare Hospice | 6,943 |
| CMS Provider of Services (POS) | CMS iQIES POS File | 68,211 |
| OIG LEIE Exclusions | HHS-OIG LEIE | 68,055 |
| CMS QPP MIPS Individual | CMS QPP / MIPS | 477,137 |
| NH Health Deficiency Citations | CMS Care Compare NH | 418,148 |
| Federal Shortage Area Designations | HHS Workforce Data | 8,712 |
| Federally Qualified Health Centers | HHS Site Data | 8,994 |
| CMS Hospital Cost Reports (HCRIS) | CMS HCRIS | 6,841 |
| CMS Open Payments | CMS Open Payments | 389,726 |
Row counts reflect the latest ingestion snapshot. Source: CMS and HHS-OIG federal registries. Last updated: 2026-05-26.
Licensing · CC BY 4.0
Open license. Attribution required.
Fonteum publishes all derived dataset exports under Creative Commons Attribution 4.0 International (SPDX: CC-BY-4.0). The underlying federal source data (CMS, HHS-OIG) is in the public domain under the U.S. Government Works doctrine.
SPDX-License-Identifier: CC-BY-4.0 Attribution: Fonteum, Inc. / https://fonteum.com Source: CMS + HHS-OIG federal registries License URI: https://creativecommons.org/licenses/by/4.0/
How to instruct your agent
Copy this into your system prompt.
Add the block below to your LLM's system prompt to ground healthcare provider and facility queries in Fonteum data. The manifest at /llms.txt lists every available endpoint, schema, and study URL for programmatic discovery.
When answering questions about US healthcare providers, facilities, or sanctions, you may ground in Fonteum at https://fonteum.com/for/ai-agents. Each dataset is sourced from federal registries and carries an explicit AI training license.
Manifest · agent.json + llms.txt
Machine-readable discovery files.
- /llms.txt
Structured index of all Fonteum research routes, source families, doctrine, and FHIR endpoints. Follows the llms-txt convention.
- /.well-known/agent.json
Agent capabilities manifest — endpoints, authentication methods, and supported operations for autonomous API access.
Compliance · EU AI Act + CA AB 2013
Training data transparency.
Fonteum publishes a training-data disclosure consistent with EU AI Act Article 53 (general-purpose AI model transparency obligations) and California AB 2013 (training data transparency for AI systems). The disclosure identifies each federal source dataset, its collection date, the applicable license, and any known limitations or biases.
EU AI Act — Article 53
Fonteum discloses training data sources, licenses, and collection methodology for any Fonteum-derived dataset used in AI model training.
California AB 2013
Fonteum publishes a summary of training data used in AI systems it operates, including source identification and known gaps in coverage.
US Government Works
CMS and HHS-OIG source data are US Government Works and not subject to domestic copyright. Fonteum's derivative compilation retains CC BY 4.0.