NOMOS CNAM — Annuaire Sante Archive¶

Reference documentation for the CNAM health directory archive: data provenance, archive layout, schema, data stability analysis, storage strategy, and DuckDB query patterns.

1. What is this data?¶

The Annuaire Sante de la CNAM was a monthly open-data publication by France's national health insurance fund (Caisse Nationale de l'Assurance Maladie). It listed every conventioned healthcare professional in France with their tariffs, schedules, addresses, and facility affiliations.

The dataset was deprecated in January 2026. The successor dataset (Annuaire Sante Ameli) does not include tariff data or practitioner schedules, making this archive the only source for historical pricing.

Field	Value
Source	https://www.data.gouv.fr/datasets/annuaire-sante-de-la-cnam-deprecie
Publisher	Caisse Nationale de l'Assurance Maladie
License	Licence Ouverte / Open Licence (fr-lo)
Temporal coverage	March 2024 — January 2026 (19 monthly snapshots)
Total size (raw)	~11 GB
Total size (gzipped)	~900 MB

2. Archive layout¶

data/archives/
  MANIFEST.md                              # Master inventory of all snapshots and CDN URLs
  annuaire-sante-cnam-{YYYY-MM}/
    ps-tarifs.csv.gz                       # Practitioner tariffs (~340 MB raw, ~22 MB gz)
    ps-infospratiques.csv.gz               # Practitioner schedules (~180 MB raw, ~22 MB gz)
    baseremboursement.csv.gz               # CCAM reimbursement bases (~13 MB raw, ~1 MB gz)
    etb-tarifs.csv.gz                      # Facility tariffs (~30 MB raw, ~4 MB gz)
    psdansetablissements.csv.gz            # Practitioners in facilities (~2.5 MB raw)
    etb-casdentaire.csv.gz                 # Dental access centers (~28 KB raw)
    etb-prado.csv.gz                       # PRADO facilities (~300 KB raw)
    etb-speexternes.csv.gz                 # Health center specialties (~350 KB raw)
  annuaire-sante-cnam-2026-01/             # Reference snapshot — also contains:
    docs/                                  # 7 PDF column documentation from CNAM
    metadata/                              # dataset_metadata.json
    README.md                              # Provenance details
    SCHEMA.md                              # Complete column documentation with code tables

Snapshot inventory¶

#	Slug	Upload date	ps-tarifs rows	ps-infos rows	Files	Notes
1	`2024-03`	2024-03-31	2,602,986	1,451,482	8/8	Earliest snapshot
2	`2024-04`	2024-04-30	2,585,635	1,450,469	8/8
3	`2024-05`	2024-05-31	2,574,636	1,450,734	8/8
4	`2024-06`	2024-06-30	2,564,073	1,450,959	8/8
5	`2024-07`	2024-07-31	2,536,847	1,446,407	8/8
6	`2024-08`	2024-08-31	2,525,400	1,445,209	8/8
7	`2024-09`	2024-09-30	2,515,011	1,447,543	8/8
8	`2024-11`	2024-11-01	2,498,171	1,448,428	8/8	Oct 2024 not published
9	`2024-12`	2024-12-01	2,487,961	1,451,534	8/8
10	`2025-01`	2025-01-01	2,474,591	1,461,522	8/8
11	`2025-02`	2025-02-01	2,444,991	1,469,076	8/8
12	`2025-03`	2025-03-01	2,429,232	1,482,613	8/8
13	`2025-04`	2025-03-31	2,414,672	1,486,712	8/8
14	`2025-05`	2025-04-30	2,397,590	1,486,669	8/8
15	`2025-06`	2025-05-31	2,388,556	1,487,646	8/8
16	`2025-07`	2025-06-30	2,381,293	1,488,183	8/8
17	`2025-08`	2025-07-31	2,364,683	1,484,063	8/8
18	`2025-09`	2025-08-31	2,354,148	1,482,675	8/8
19	`2025-11`	2025-11-01	2,331,899	—	1/8	Partial: ps-tarifs only
20	`2026-01`	2026-02-01	2,291,052	1,510,240	8/8	Final snapshot (deprecated)

Slug convention: directory names use the upload-date month (YYYY-MM). For end-of-month uploads (e.g. uploaded March 31), the slug uses that month (e.g. 2025-03), not the next month. Exception: when two uploads fall in the same calendar month, the second gets the next month's slug (e.g. 2025-03 for Mar 1 upload and 2025-04 for Mar 31 upload).

Lost snapshots¶

These were purged from the CDN before we could capture them:

Oct 2024: never published
Oct-Dec 2025: CDN purged (only Nov 2025 ps-tarifs salvaged locally)
Pre-2024: 6 Wayback Machine references (May 2021 — Aug 2023), all 404 on CDN

3. Data schema¶

All CSVs are semicolon-separated (;), have no header row, and use latin-1 encoding.

Full column documentation with code tables is in data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md.

File summary¶

File	Description	Rows (latest)	Key columns
`ps-tarifs.csv`	Practitioner tariffs per act	2,291,052	nom, prenom, code_postal, profession, code_acte, montant_*
`ps-infospratiques.csv`	Practitioner schedules	1,510,240	nom, prenom, code_postal, profession, jour, heure_debut/fin
`baseremboursement.csv`	CCAM reimbursement bases	400,174	code_acte_ccam, profession, convention, borne_inf/sup
`etb-tarifs.csv`	Facility hospitalization tariffs	208,267	nom, code_postal, specialite, cout_global, remboursement_cnam
`psdansetablissements.csv`	Practitioners in facilities	22,519	nom_etablissement, nom_ps, profession_ps
`etb-casdentaire.csv`	Dental access centers	325	nom, code_postal, profession
`etb-prado.csv`	PRADO facilities	2,861	nom, code_postal, thematique_prado
`etb-speexternes.csv`	Health center specialties	3,954	nom, code_postal, profession

Key code tables¶

Professions (used across all PS files): integer codes 1-74. Key dental codes: - 18 = Chirurgien-dentiste - 19 = Chirurgien-dentiste specialiste en orthopedie dento-faciale - 20 = Chirurgien-dentiste specialiste en chirurgie orale - 21 = Chirurgien-dentiste specialiste en medecine bucco-dentaire

Convention: nc (non conventionne), c1 (secteur 1), c2 (secteur 1 avec depassement), c3 (secteur 2)

Nature d'exercice: 1-8 (1=inactive, 3=liberal integral, etc.)

No unique practitioner ID¶

The CSVs contain no RPPS number or stable unique identifier. Practitioners are identified by (nom, prenom, adresse, profession) which is fragile across snapshots (address changes, typos). This is a fundamental limitation for row-level tracking.

4. Data stability analysis¶

Analysis performed April 2026 across all 19 consecutive monthly snapshots.

ps-tarifs.csv — month-to-month diff (sorted line comparison)¶

Transition	Rows A	Rows B	Identical	Changed lines	Change rate
2024-03 → 04	2,602,986	2,585,635	2,560,830	66,961	1.2%
2024-04 → 05	2,585,635	2,574,636	2,559,102	42,067	0.8%
2024-05 → 06	2,574,636	2,564,073	2,542,261	54,187	1.0%
2024-06 → 07	2,564,073	2,536,847	2,513,171	74,578	1.4%
2024-07 → 08	2,536,847	2,525,400	2,511,605	39,037	0.7%
2024-08 → 09	2,525,400	2,515,011	2,490,261	59,889	1.1%
2024-09 → 11	2,515,011	2,498,171	2,468,642	75,898	1.5%
2024-11 → 12	2,498,171	2,487,961	2,471,128	43,876	0.8%
2024-12 → 25-01	2,487,961	2,474,591	2,446,440	69,672	1.4%
2025-01 → 02	2,474,591	2,444,991	2,421,638	76,306	1.5%
2025-02 → 03	2,444,991	2,429,232	2,405,017	64,189	1.3%
2025-03 → 04	2,429,232	2,414,672	2,392,574	58,756	1.2%
2025-04 → 05	2,414,672	2,397,590	2,373,617	65,028	1.3%
2025-05 → 06	2,397,590	2,388,556	2,368,870	48,406	1.0%
2025-06 → 07	2,388,556	2,381,293	2,360,678	48,493	1.0%
2025-07 → 08	2,381,293	2,364,683	2,333,975	78,026	1.6%
2025-08 → 09	2,364,683	2,354,148	2,322,641	73,549	1.5%

Average monthly change: 1.2% — 98.8% of rows are identical between consecutive months.

ps-infospratiques.csv — month-to-month diff¶

Transition	Identical	Changed lines	Change rate
2024-03 → 04	1,418,169	65,613	2.2%
2024-04 → 05	1,431,408	38,387	1.3%
2024-05 → 06	1,428,251	45,191	1.5%
2024-06 → 07	1,418,165	61,036	2.1%
2024-07 → 08	1,427,718	36,180	1.2%
2024-08 → 09	1,420,201	52,350	1.8%
2024-09 → 11	1,412,090	71,791	2.4%
2024-11 → 12	1,426,284	47,394	1.6%
2024-12 → 25-01	1,414,883	83,290	2.8%
2025-01 → 02	1,416,865	96,868	3.3%
2025-02 → 03	1,432,698	86,293	2.9%
2025-03 → 04	1,451,063	67,199	2.2%
2025-04 → 05	1,454,247	64,887	2.1%
2025-05 → 06	1,443,061	88,730	2.9%
2025-06 → 07	1,465,983	43,863	1.4%
2025-07 → 08	1,453,813	64,620	2.1%
2025-08 → 09	1,452,627	61,484	2.0%

Average monthly change: 2.1% — 97.9% of rows are identical.

Static tables (zero content changes across all 19 snapshots)¶

These four tables have identical content in every snapshot (only row ordering changes):

File	Rows	Status
`baseremboursement.csv`	400,174	Identical across all 19 snapshots
`etb-tarifs.csv`	208,267	Identical (2 rows changed once in Jun 2024)
`etb-casdentaire.csv`	325	Identical
`etb-prado.csv`	2,861	Identical
`etb-speexternes.csv`	3,954	Identical

Overall data trend¶

The ps-tarifs row count declines steadily from 2,602,986 (Mar 2024) to 2,291,052 (Jan 2026) — a 12% decline over 22 months, roughly 14,000 fewer rows per month. This reflects practitioners leaving the conventioned system.

The ps-infospratiques row count is more stable (~1.45M → ~1.51M), with slight growth.

5. Storage strategy¶

Decision: PostgreSQL (latest snapshot) + DuckDB (historical analysis)¶

Layer	What	Rows	Use case
PostgreSQL	Latest snapshot (`2026-01`)	4.4M	Django ORM, admin, JOINs with other tables
DuckDB	All 20 gzipped archives	~45M on-demand	Cross-snapshot trends, historical analysis
Gzipped CSV	Source of truth on disk	—	955 MB total, no import needed for DuckDB

Why not load all snapshots into PostgreSQL? - 20 snapshots × 4.5M = ~90M rows for a read-only archive - 98-99% of rows are identical between consecutive months (only ~1-2% changes) - 4 of 8 tables have zero content changes across all snapshots - The dataset is deprecated — no new snapshots coming, no ongoing pipeline - DuckDB reads the gzipped archives directly without any import step

Compression¶

All CSV files are stored gzipped. Compression ratios:

File	Raw	Gzipped	Ratio
`ps-tarifs.csv`	348 MB	22 MB	6.4%
`ps-infospratiques.csv`	179 MB	22 MB	12.3%
`baseremboursement.csv`	13 MB	1 MB	7.2%
`etb-tarifs.csv`	31 MB	4 MB	11.9%
Small files	3 MB	0.4 MB	~13%
Per snapshot	548 MB	47 MB	8.6%
All 20 snapshots	~11 GB	~900 MB	8.6%

6. DuckDB access¶

Installation¶

pip install duckdb   # Python
brew install duckdb  # CLI

Column definitions¶

DuckDB needs explicit column names since the CSVs have no headers. The canonical column definitions are in apps/nomos_cnam/duckdb_helper.py (the COLUMNS dict). Key column names:

ps_tarifs: civilite, nom, prenom, adresse1-4, code_postal, commune, telephone, profession, mode_exercice, nature_exercice, convention, option_cas, sesam_vitale, code_acte, famille_acte, montant_principal, borne_inf_principal, borne_sup_principal, montant_2nd, ...montant_cec (33 cols)

ps_infos: same identity/address columns + type_activite, type_consultation, heure_debut, heure_fin, jour (21 cols)

base_remboursement: code_acte_ccam, activite, profession, convention, option_cas, type_affichage, borne_inf_base_remb, borne_sup_base_remb (8 cols)

etb_tarifs: nom, adresse1-3, code_postal, commune, telephone, type_etablissement, specialite, nb_hospitalisations, nb_hospitalisations_2, indicateur_nb_hosp, nb_moyen_nuitees, indicateur_nuitees, cout_global, remboursement_cnam, reste_charge_hosp, reste_charge_depassements (18 cols, financial columns are text descriptions like "de 1860 a 4390 euros.")

ps_etablissements: nom_etablissement, adresse1-3, code_postal, commune, telephone, type_etablissement, nom_ps, prenom_ps, profession_ps (11 cols)

Python helper¶

The module apps/nomos_cnam/duckdb_helper.py provides a ready-to-use query interface.

from apps.nomos_cnam.duckdb_helper import cnam_query, cnam_snapshot, available_snapshots

# List available snapshots
available_snapshots()
# [('2024-03', 8), ('2024-04', 8), ..., ('2026-01', 8)]

# Query latest snapshot (returns pandas DataFrame)
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18 AND code_postal LIKE '75%'")

# Query specific snapshot
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18", snapshot='2024-05')

# Query across all snapshots (adds snapshot_date column automatically)
df = cnam_query(
    "SELECT snapshot_date, count(*) as n FROM ps_tarifs GROUP BY 1 ORDER BY 1",
    snapshot='all'
)

# Interactive exploration (returns DuckDB connection with views registered)
con = cnam_snapshot('2025-09')
con.sql("SELECT profession, count(*) FROM ps_tarifs GROUP BY 1 ORDER BY 2 DESC")

Available views: ps_tarifs, ps_infos, base_remboursement, etb_tarifs, ps_etablissements, etb_cas_dentaire, etb_prado, etb_spe_externes

7. Example queries¶

Dentists in Paris (latest snapshot)¶

df = cnam_query("""
    SELECT nom, prenom, code_postal, commune, convention,
           code_acte, montant_principal
    FROM ps_tarifs
    WHERE profession = 18
      AND code_postal LIKE '75%'
    ORDER BY nom, prenom
""")

Dentist count by departement over time¶

df = cnam_query("""
    SELECT snapshot_date,
           SUBSTRING(code_postal, 1, 2) AS dept,
           COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
    FROM ps_tarifs
    WHERE profession = 18
    GROUP BY 1, 2
    ORDER BY 1, 2
""", snapshot='all')

Average tariff for a specific CCAM act¶

df = cnam_query("""
    SELECT snapshot_date,
           AVG(montant_principal) AS avg_tarif,
           PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY montant_principal) AS median_tarif,
           COUNT(*) AS n_practitioners
    FROM ps_tarifs
    WHERE code_acte = 'HBMD038'
      AND montant_principal > 0
    GROUP BY 1
    ORDER BY 1
""", snapshot='all')

Convention distribution for dentists¶

df = cnam_query("""
    SELECT convention,
           COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
    FROM ps_tarifs
    WHERE profession = 18
    GROUP BY 1
    ORDER BY 2 DESC
""")

CCAM reimbursement base lookup¶

df = cnam_query("""
    SELECT code_acte_ccam, profession, convention,
           borne_inf_base_remb, borne_sup_base_remb
    FROM base_remboursement
    WHERE code_acte_ccam = 'HBMD038'
    ORDER BY profession, convention
""")

Facility tariffs for dental specialties¶

df = cnam_query("""
    SELECT nom, commune, code_postal,
           nb_hospitalisations, cout_global, remboursement_cnam,
           reste_charge_hosp
    FROM etb_tarifs
    WHERE specialite IN (56, 57)  -- dental specialties
    ORDER BY nb_hospitalisations DESC
    LIMIT 20
""")

Cross-snapshot delta report (what changed between two months)¶

from apps.nomos_cnam.duckdb_helper import cnam_query

# Practitioners that were in Aug 2025 but not Sep 2025 (departures)
departures = cnam_query("""
    SELECT DISTINCT a.nom, a.prenom, a.code_postal, a.profession
    FROM ps_tarifs a
    WHERE a.snapshot_date = '2025-08'
      AND NOT EXISTS (
          SELECT 1 FROM ps_tarifs b
          WHERE b.snapshot_date = '2025-09'
            AND a.nom = b.nom AND a.prenom = b.prenom
            AND a.code_postal = b.code_postal AND a.profession = b.profession
      )
""", snapshot='all')

DuckDB CLI one-liner¶

# Total rows across all snapshots
duckdb -c "
  SELECT filename, COUNT(*) as rows
  FROM read_csv('data/archives/annuaire-sante-cnam-*/csv/ps-tarifs.csv.gz',
                delim=';', header=false, filename=true)
  GROUP BY 1 ORDER BY 1
"

8. Relationship to the Django app¶

The apps/nomos_cnam/ Django app was originally designed to import all snapshots into PostgreSQL tables using COPY. Given the analysis above, this approach has been superseded by DuckDB for archive queries.

The Django app components: - duckdb_helper.py — DuckDB query interface for cross-snapshot analysis (cnam_query, cnam_snapshot) - models/ — Django ORM models for all 8 tables + CnamImportLog - management/commands/cnam_import.py — PostgreSQL importer (decompresses .csv.gz on-the-fly, streams via COPY) - admin.py — Admin registration for CnamImportLog

Two query paths¶

PostgreSQL (Django ORM) — latest snapshot loaded, for use in Django views, ORM joins, admin:

from apps.nomos_cnam.models import PsTarif

PsTarif.objects.filter(profession=18, code_postal__startswith='75').count()

DuckDB (cross-snapshot analysis) — reads all 20 gzipped archives directly, no import needed:

from apps.nomos_cnam.duckdb_helper import cnam_query

df = cnam_query("SELECT snapshot_date, count(*) FROM ps_tarifs GROUP BY 1", snapshot='all')

Importing into PostgreSQL¶

The latest snapshot (2026-01) is loaded into PostgreSQL (4.4M rows, ~3 min). To reload or import a different snapshot:

# Import latest (decompresses .csv.gz on-the-fly)
make shell
python manage.py cnam_import --snapshot 2026-01

# Dry-run to see row counts without writing
python manage.py cnam_import --snapshot 2026-01 --dry-run

# Import specific tables only
python manage.py cnam_import --snapshot 2026-01 --tables ps-tarifs.csv baseremboursement.csv

The command automatically deletes existing rows for that snapshot_date before inserting, so re-running is idempotent. To switch to a different snapshot, just run with a different --snapshot value — old data for the previous snapshot remains unless you truncate manually.

9. Files reference¶

Path	Purpose
`data/archives/MANIFEST.md`	Master inventory, CDN URLs, download status
`data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md`	Complete column schema with code tables
`data/archives/annuaire-sante-cnam-2026-01/README.md`	Dataset provenance and file layout
`docs/NOMOS_CNAM_ARCHIVE.md`	This document
`apps/nomos_cnam/`	Django app (models, legacy importer)
`scripts/cnam_download_missing.sh`	Download script for CDN snapshots