NOMOS CNAM — Annuaire Sante Archive¶
Reference documentation for the CNAM health directory archive: data provenance, archive layout, schema, data stability analysis, storage strategy, and DuckDB query patterns.
1. What is this data?¶
The Annuaire Sante de la CNAM was a monthly open-data publication by France's national health insurance fund (Caisse Nationale de l'Assurance Maladie). It listed every conventioned healthcare professional in France with their tariffs, schedules, addresses, and facility affiliations.
The dataset was deprecated in January 2026. The successor dataset (Annuaire Sante Ameli) does not include tariff data or practitioner schedules, making this archive the only source for historical pricing.
| Field | Value |
|---|---|
| Source | https://www.data.gouv.fr/datasets/annuaire-sante-de-la-cnam-deprecie |
| Publisher | Caisse Nationale de l'Assurance Maladie |
| License | Licence Ouverte / Open Licence (fr-lo) |
| Temporal coverage | March 2024 — January 2026 (19 monthly snapshots) |
| Total size (raw) | ~11 GB |
| Total size (gzipped) | ~900 MB |
2. Archive layout¶
data/archives/
MANIFEST.md # Master inventory of all snapshots and CDN URLs
annuaire-sante-cnam-{YYYY-MM}/
ps-tarifs.csv.gz # Practitioner tariffs (~340 MB raw, ~22 MB gz)
ps-infospratiques.csv.gz # Practitioner schedules (~180 MB raw, ~22 MB gz)
baseremboursement.csv.gz # CCAM reimbursement bases (~13 MB raw, ~1 MB gz)
etb-tarifs.csv.gz # Facility tariffs (~30 MB raw, ~4 MB gz)
psdansetablissements.csv.gz # Practitioners in facilities (~2.5 MB raw)
etb-casdentaire.csv.gz # Dental access centers (~28 KB raw)
etb-prado.csv.gz # PRADO facilities (~300 KB raw)
etb-speexternes.csv.gz # Health center specialties (~350 KB raw)
annuaire-sante-cnam-2026-01/ # Reference snapshot — also contains:
docs/ # 7 PDF column documentation from CNAM
metadata/ # dataset_metadata.json
README.md # Provenance details
SCHEMA.md # Complete column documentation with code tables
Snapshot inventory¶
| # | Slug | Upload date | ps-tarifs rows | ps-infos rows | Files | Notes |
|---|---|---|---|---|---|---|
| 1 | 2024-03 |
2024-03-31 | 2,602,986 | 1,451,482 | 8/8 | Earliest snapshot |
| 2 | 2024-04 |
2024-04-30 | 2,585,635 | 1,450,469 | 8/8 | |
| 3 | 2024-05 |
2024-05-31 | 2,574,636 | 1,450,734 | 8/8 | |
| 4 | 2024-06 |
2024-06-30 | 2,564,073 | 1,450,959 | 8/8 | |
| 5 | 2024-07 |
2024-07-31 | 2,536,847 | 1,446,407 | 8/8 | |
| 6 | 2024-08 |
2024-08-31 | 2,525,400 | 1,445,209 | 8/8 | |
| 7 | 2024-09 |
2024-09-30 | 2,515,011 | 1,447,543 | 8/8 | |
| 8 | 2024-11 |
2024-11-01 | 2,498,171 | 1,448,428 | 8/8 | Oct 2024 not published |
| 9 | 2024-12 |
2024-12-01 | 2,487,961 | 1,451,534 | 8/8 | |
| 10 | 2025-01 |
2025-01-01 | 2,474,591 | 1,461,522 | 8/8 | |
| 11 | 2025-02 |
2025-02-01 | 2,444,991 | 1,469,076 | 8/8 | |
| 12 | 2025-03 |
2025-03-01 | 2,429,232 | 1,482,613 | 8/8 | |
| 13 | 2025-04 |
2025-03-31 | 2,414,672 | 1,486,712 | 8/8 | |
| 14 | 2025-05 |
2025-04-30 | 2,397,590 | 1,486,669 | 8/8 | |
| 15 | 2025-06 |
2025-05-31 | 2,388,556 | 1,487,646 | 8/8 | |
| 16 | 2025-07 |
2025-06-30 | 2,381,293 | 1,488,183 | 8/8 | |
| 17 | 2025-08 |
2025-07-31 | 2,364,683 | 1,484,063 | 8/8 | |
| 18 | 2025-09 |
2025-08-31 | 2,354,148 | 1,482,675 | 8/8 | |
| 19 | 2025-11 |
2025-11-01 | 2,331,899 | — | 1/8 | Partial: ps-tarifs only |
| 20 | 2026-01 |
2026-02-01 | 2,291,052 | 1,510,240 | 8/8 | Final snapshot (deprecated) |
Slug convention: directory names use the upload-date month (YYYY-MM). For end-of-month uploads (e.g. uploaded March 31), the slug uses that month (e.g. 2025-03), not the next month. Exception: when two uploads fall in the same calendar month, the second gets the next month's slug (e.g. 2025-03 for Mar 1 upload and 2025-04 for Mar 31 upload).
Lost snapshots¶
These were purged from the CDN before we could capture them:
- Oct 2024: never published
- Oct-Dec 2025: CDN purged (only Nov 2025 ps-tarifs salvaged locally)
- Pre-2024: 6 Wayback Machine references (May 2021 — Aug 2023), all 404 on CDN
3. Data schema¶
All CSVs are semicolon-separated (;), have no header row, and use latin-1 encoding.
Full column documentation with code tables is in data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md.
File summary¶
| File | Description | Rows (latest) | Key columns |
|---|---|---|---|
ps-tarifs.csv |
Practitioner tariffs per act | 2,291,052 | nom, prenom, code_postal, profession, code_acte, montant_* |
ps-infospratiques.csv |
Practitioner schedules | 1,510,240 | nom, prenom, code_postal, profession, jour, heure_debut/fin |
baseremboursement.csv |
CCAM reimbursement bases | 400,174 | code_acte_ccam, profession, convention, borne_inf/sup |
etb-tarifs.csv |
Facility hospitalization tariffs | 208,267 | nom, code_postal, specialite, cout_global, remboursement_cnam |
psdansetablissements.csv |
Practitioners in facilities | 22,519 | nom_etablissement, nom_ps, profession_ps |
etb-casdentaire.csv |
Dental access centers | 325 | nom, code_postal, profession |
etb-prado.csv |
PRADO facilities | 2,861 | nom, code_postal, thematique_prado |
etb-speexternes.csv |
Health center specialties | 3,954 | nom, code_postal, profession |
Key code tables¶
Professions (used across all PS files): integer codes 1-74. Key dental codes:
- 18 = Chirurgien-dentiste
- 19 = Chirurgien-dentiste specialiste en orthopedie dento-faciale
- 20 = Chirurgien-dentiste specialiste en chirurgie orale
- 21 = Chirurgien-dentiste specialiste en medecine bucco-dentaire
Convention: nc (non conventionne), c1 (secteur 1), c2 (secteur 1 avec depassement), c3 (secteur 2)
Nature d'exercice: 1-8 (1=inactive, 3=liberal integral, etc.)
No unique practitioner ID¶
The CSVs contain no RPPS number or stable unique identifier. Practitioners are identified by (nom, prenom, adresse, profession) which is fragile across snapshots (address changes, typos). This is a fundamental limitation for row-level tracking.
4. Data stability analysis¶
Analysis performed April 2026 across all 19 consecutive monthly snapshots.
ps-tarifs.csv — month-to-month diff (sorted line comparison)¶
| Transition | Rows A | Rows B | Identical | Changed lines | Change rate |
|---|---|---|---|---|---|
| 2024-03 → 04 | 2,602,986 | 2,585,635 | 2,560,830 | 66,961 | 1.2% |
| 2024-04 → 05 | 2,585,635 | 2,574,636 | 2,559,102 | 42,067 | 0.8% |
| 2024-05 → 06 | 2,574,636 | 2,564,073 | 2,542,261 | 54,187 | 1.0% |
| 2024-06 → 07 | 2,564,073 | 2,536,847 | 2,513,171 | 74,578 | 1.4% |
| 2024-07 → 08 | 2,536,847 | 2,525,400 | 2,511,605 | 39,037 | 0.7% |
| 2024-08 → 09 | 2,525,400 | 2,515,011 | 2,490,261 | 59,889 | 1.1% |
| 2024-09 → 11 | 2,515,011 | 2,498,171 | 2,468,642 | 75,898 | 1.5% |
| 2024-11 → 12 | 2,498,171 | 2,487,961 | 2,471,128 | 43,876 | 0.8% |
| 2024-12 → 25-01 | 2,487,961 | 2,474,591 | 2,446,440 | 69,672 | 1.4% |
| 2025-01 → 02 | 2,474,591 | 2,444,991 | 2,421,638 | 76,306 | 1.5% |
| 2025-02 → 03 | 2,444,991 | 2,429,232 | 2,405,017 | 64,189 | 1.3% |
| 2025-03 → 04 | 2,429,232 | 2,414,672 | 2,392,574 | 58,756 | 1.2% |
| 2025-04 → 05 | 2,414,672 | 2,397,590 | 2,373,617 | 65,028 | 1.3% |
| 2025-05 → 06 | 2,397,590 | 2,388,556 | 2,368,870 | 48,406 | 1.0% |
| 2025-06 → 07 | 2,388,556 | 2,381,293 | 2,360,678 | 48,493 | 1.0% |
| 2025-07 → 08 | 2,381,293 | 2,364,683 | 2,333,975 | 78,026 | 1.6% |
| 2025-08 → 09 | 2,364,683 | 2,354,148 | 2,322,641 | 73,549 | 1.5% |
Average monthly change: 1.2% — 98.8% of rows are identical between consecutive months.
ps-infospratiques.csv — month-to-month diff¶
| Transition | Identical | Changed lines | Change rate |
|---|---|---|---|
| 2024-03 → 04 | 1,418,169 | 65,613 | 2.2% |
| 2024-04 → 05 | 1,431,408 | 38,387 | 1.3% |
| 2024-05 → 06 | 1,428,251 | 45,191 | 1.5% |
| 2024-06 → 07 | 1,418,165 | 61,036 | 2.1% |
| 2024-07 → 08 | 1,427,718 | 36,180 | 1.2% |
| 2024-08 → 09 | 1,420,201 | 52,350 | 1.8% |
| 2024-09 → 11 | 1,412,090 | 71,791 | 2.4% |
| 2024-11 → 12 | 1,426,284 | 47,394 | 1.6% |
| 2024-12 → 25-01 | 1,414,883 | 83,290 | 2.8% |
| 2025-01 → 02 | 1,416,865 | 96,868 | 3.3% |
| 2025-02 → 03 | 1,432,698 | 86,293 | 2.9% |
| 2025-03 → 04 | 1,451,063 | 67,199 | 2.2% |
| 2025-04 → 05 | 1,454,247 | 64,887 | 2.1% |
| 2025-05 → 06 | 1,443,061 | 88,730 | 2.9% |
| 2025-06 → 07 | 1,465,983 | 43,863 | 1.4% |
| 2025-07 → 08 | 1,453,813 | 64,620 | 2.1% |
| 2025-08 → 09 | 1,452,627 | 61,484 | 2.0% |
Average monthly change: 2.1% — 97.9% of rows are identical.
Static tables (zero content changes across all 19 snapshots)¶
These four tables have identical content in every snapshot (only row ordering changes):
| File | Rows | Status |
|---|---|---|
baseremboursement.csv |
400,174 | Identical across all 19 snapshots |
etb-tarifs.csv |
208,267 | Identical (2 rows changed once in Jun 2024) |
etb-casdentaire.csv |
325 | Identical |
etb-prado.csv |
2,861 | Identical |
etb-speexternes.csv |
3,954 | Identical |
Overall data trend¶
The ps-tarifs row count declines steadily from 2,602,986 (Mar 2024) to 2,291,052 (Jan 2026) — a 12% decline over 22 months, roughly 14,000 fewer rows per month. This reflects practitioners leaving the conventioned system.
The ps-infospratiques row count is more stable (~1.45M → ~1.51M), with slight growth.
5. Storage strategy¶
Decision: PostgreSQL (latest snapshot) + DuckDB (historical analysis)¶
| Layer | What | Rows | Use case |
|---|---|---|---|
| PostgreSQL | Latest snapshot (2026-01) |
4.4M | Django ORM, admin, JOINs with other tables |
| DuckDB | All 20 gzipped archives | ~45M on-demand | Cross-snapshot trends, historical analysis |
| Gzipped CSV | Source of truth on disk | — | 955 MB total, no import needed for DuckDB |
Why not load all snapshots into PostgreSQL? - 20 snapshots × 4.5M = ~90M rows for a read-only archive - 98-99% of rows are identical between consecutive months (only ~1-2% changes) - 4 of 8 tables have zero content changes across all snapshots - The dataset is deprecated — no new snapshots coming, no ongoing pipeline - DuckDB reads the gzipped archives directly without any import step
Compression¶
All CSV files are stored gzipped. Compression ratios:
| File | Raw | Gzipped | Ratio |
|---|---|---|---|
ps-tarifs.csv |
348 MB | 22 MB | 6.4% |
ps-infospratiques.csv |
179 MB | 22 MB | 12.3% |
baseremboursement.csv |
13 MB | 1 MB | 7.2% |
etb-tarifs.csv |
31 MB | 4 MB | 11.9% |
| Small files | 3 MB | 0.4 MB | ~13% |
| Per snapshot | 548 MB | 47 MB | 8.6% |
| All 20 snapshots | ~11 GB | ~900 MB | 8.6% |
6. DuckDB access¶
Installation¶
Column definitions¶
DuckDB needs explicit column names since the CSVs have no headers. The canonical column
definitions are in apps/nomos_cnam/duckdb_helper.py (the COLUMNS dict). Key column names:
ps_tarifs: civilite, nom, prenom, adresse1-4, code_postal, commune, telephone, profession, mode_exercice, nature_exercice, convention, option_cas, sesam_vitale, code_acte, famille_acte, montant_principal, borne_inf_principal, borne_sup_principal, montant_2nd, ...montant_cec (33 cols)
ps_infos: same identity/address columns + type_activite, type_consultation, heure_debut, heure_fin, jour (21 cols)
base_remboursement: code_acte_ccam, activite, profession, convention, option_cas, type_affichage, borne_inf_base_remb, borne_sup_base_remb (8 cols)
etb_tarifs: nom, adresse1-3, code_postal, commune, telephone, type_etablissement, specialite, nb_hospitalisations, nb_hospitalisations_2, indicateur_nb_hosp, nb_moyen_nuitees, indicateur_nuitees, cout_global, remboursement_cnam, reste_charge_hosp, reste_charge_depassements (18 cols, financial columns are text descriptions like "de 1860 a 4390 euros.")
ps_etablissements: nom_etablissement, adresse1-3, code_postal, commune, telephone, type_etablissement, nom_ps, prenom_ps, profession_ps (11 cols)
Python helper¶
The module apps/nomos_cnam/duckdb_helper.py provides a ready-to-use query interface.
from apps.nomos_cnam.duckdb_helper import cnam_query, cnam_snapshot, available_snapshots
# List available snapshots
available_snapshots()
# [('2024-03', 8), ('2024-04', 8), ..., ('2026-01', 8)]
# Query latest snapshot (returns pandas DataFrame)
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18 AND code_postal LIKE '75%'")
# Query specific snapshot
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18", snapshot='2024-05')
# Query across all snapshots (adds snapshot_date column automatically)
df = cnam_query(
"SELECT snapshot_date, count(*) as n FROM ps_tarifs GROUP BY 1 ORDER BY 1",
snapshot='all'
)
# Interactive exploration (returns DuckDB connection with views registered)
con = cnam_snapshot('2025-09')
con.sql("SELECT profession, count(*) FROM ps_tarifs GROUP BY 1 ORDER BY 2 DESC")
Available views: ps_tarifs, ps_infos, base_remboursement, etb_tarifs,
ps_etablissements, etb_cas_dentaire, etb_prado, etb_spe_externes
7. Example queries¶
Dentists in Paris (latest snapshot)¶
df = cnam_query("""
SELECT nom, prenom, code_postal, commune, convention,
code_acte, montant_principal
FROM ps_tarifs
WHERE profession = 18
AND code_postal LIKE '75%'
ORDER BY nom, prenom
""")
Dentist count by departement over time¶
df = cnam_query("""
SELECT snapshot_date,
SUBSTRING(code_postal, 1, 2) AS dept,
COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
FROM ps_tarifs
WHERE profession = 18
GROUP BY 1, 2
ORDER BY 1, 2
""", snapshot='all')
Average tariff for a specific CCAM act¶
df = cnam_query("""
SELECT snapshot_date,
AVG(montant_principal) AS avg_tarif,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY montant_principal) AS median_tarif,
COUNT(*) AS n_practitioners
FROM ps_tarifs
WHERE code_acte = 'HBMD038'
AND montant_principal > 0
GROUP BY 1
ORDER BY 1
""", snapshot='all')
Convention distribution for dentists¶
df = cnam_query("""
SELECT convention,
COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
FROM ps_tarifs
WHERE profession = 18
GROUP BY 1
ORDER BY 2 DESC
""")
CCAM reimbursement base lookup¶
df = cnam_query("""
SELECT code_acte_ccam, profession, convention,
borne_inf_base_remb, borne_sup_base_remb
FROM base_remboursement
WHERE code_acte_ccam = 'HBMD038'
ORDER BY profession, convention
""")
Facility tariffs for dental specialties¶
df = cnam_query("""
SELECT nom, commune, code_postal,
nb_hospitalisations, cout_global, remboursement_cnam,
reste_charge_hosp
FROM etb_tarifs
WHERE specialite IN (56, 57) -- dental specialties
ORDER BY nb_hospitalisations DESC
LIMIT 20
""")
Cross-snapshot delta report (what changed between two months)¶
from apps.nomos_cnam.duckdb_helper import cnam_query
# Practitioners that were in Aug 2025 but not Sep 2025 (departures)
departures = cnam_query("""
SELECT DISTINCT a.nom, a.prenom, a.code_postal, a.profession
FROM ps_tarifs a
WHERE a.snapshot_date = '2025-08'
AND NOT EXISTS (
SELECT 1 FROM ps_tarifs b
WHERE b.snapshot_date = '2025-09'
AND a.nom = b.nom AND a.prenom = b.prenom
AND a.code_postal = b.code_postal AND a.profession = b.profession
)
""", snapshot='all')
DuckDB CLI one-liner¶
# Total rows across all snapshots
duckdb -c "
SELECT filename, COUNT(*) as rows
FROM read_csv('data/archives/annuaire-sante-cnam-*/csv/ps-tarifs.csv.gz',
delim=';', header=false, filename=true)
GROUP BY 1 ORDER BY 1
"
8. Relationship to the Django app¶
The apps/nomos_cnam/ Django app was originally designed to import all snapshots into PostgreSQL tables using COPY. Given the analysis above, this approach has been superseded by DuckDB for archive queries.
The Django app components:
- duckdb_helper.py — DuckDB query interface for cross-snapshot analysis (cnam_query, cnam_snapshot)
- models/ — Django ORM models for all 8 tables + CnamImportLog
- management/commands/cnam_import.py — PostgreSQL importer (decompresses .csv.gz on-the-fly, streams via COPY)
- admin.py — Admin registration for CnamImportLog
Two query paths¶
PostgreSQL (Django ORM) — latest snapshot loaded, for use in Django views, ORM joins, admin:
from apps.nomos_cnam.models import PsTarif
PsTarif.objects.filter(profession=18, code_postal__startswith='75').count()
DuckDB (cross-snapshot analysis) — reads all 20 gzipped archives directly, no import needed:
from apps.nomos_cnam.duckdb_helper import cnam_query
df = cnam_query("SELECT snapshot_date, count(*) FROM ps_tarifs GROUP BY 1", snapshot='all')
Importing into PostgreSQL¶
The latest snapshot (2026-01) is loaded into PostgreSQL (4.4M rows, ~3 min).
To reload or import a different snapshot:
# Import latest (decompresses .csv.gz on-the-fly)
make shell
python manage.py cnam_import --snapshot 2026-01
# Dry-run to see row counts without writing
python manage.py cnam_import --snapshot 2026-01 --dry-run
# Import specific tables only
python manage.py cnam_import --snapshot 2026-01 --tables ps-tarifs.csv baseremboursement.csv
The command automatically deletes existing rows for that snapshot_date before inserting,
so re-running is idempotent. To switch to a different snapshot, just run with a different
--snapshot value — old data for the previous snapshot remains unless you truncate manually.
9. Files reference¶
| Path | Purpose |
|---|---|
data/archives/MANIFEST.md |
Master inventory, CDN URLs, download status |
data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md |
Complete column schema with code tables |
data/archives/annuaire-sante-cnam-2026-01/README.md |
Dataset provenance and file layout |
docs/NOMOS_CNAM_ARCHIVE.md |
This document |
apps/nomos_cnam/ |
Django app (models, legacy importer) |
scripts/cnam_download_missing.sh |
Download script for CDN snapshots |