Aller au contenu

NOMOS CNAM — Annuaire Sante Archive

Reference documentation for the CNAM health directory archive: data provenance, archive layout, schema, data stability analysis, storage strategy, and DuckDB query patterns.


1. What is this data?

The Annuaire Sante de la CNAM was a monthly open-data publication by France's national health insurance fund (Caisse Nationale de l'Assurance Maladie). It listed every conventioned healthcare professional in France with their tariffs, schedules, addresses, and facility affiliations.

The dataset was deprecated in January 2026. The successor dataset (Annuaire Sante Ameli) does not include tariff data or practitioner schedules, making this archive the only source for historical pricing.

Field Value
Source https://www.data.gouv.fr/datasets/annuaire-sante-de-la-cnam-deprecie
Publisher Caisse Nationale de l'Assurance Maladie
License Licence Ouverte / Open Licence (fr-lo)
Temporal coverage March 2024 — January 2026 (19 monthly snapshots)
Total size (raw) ~11 GB
Total size (gzipped) ~900 MB

2. Archive layout

data/archives/
  MANIFEST.md                              # Master inventory of all snapshots and CDN URLs
  annuaire-sante-cnam-{YYYY-MM}/
    ps-tarifs.csv.gz                       # Practitioner tariffs (~340 MB raw, ~22 MB gz)
    ps-infospratiques.csv.gz               # Practitioner schedules (~180 MB raw, ~22 MB gz)
    baseremboursement.csv.gz               # CCAM reimbursement bases (~13 MB raw, ~1 MB gz)
    etb-tarifs.csv.gz                      # Facility tariffs (~30 MB raw, ~4 MB gz)
    psdansetablissements.csv.gz            # Practitioners in facilities (~2.5 MB raw)
    etb-casdentaire.csv.gz                 # Dental access centers (~28 KB raw)
    etb-prado.csv.gz                       # PRADO facilities (~300 KB raw)
    etb-speexternes.csv.gz                 # Health center specialties (~350 KB raw)
  annuaire-sante-cnam-2026-01/             # Reference snapshot — also contains:
    docs/                                  # 7 PDF column documentation from CNAM
    metadata/                              # dataset_metadata.json
    README.md                              # Provenance details
    SCHEMA.md                              # Complete column documentation with code tables

Snapshot inventory

# Slug Upload date ps-tarifs rows ps-infos rows Files Notes
1 2024-03 2024-03-31 2,602,986 1,451,482 8/8 Earliest snapshot
2 2024-04 2024-04-30 2,585,635 1,450,469 8/8
3 2024-05 2024-05-31 2,574,636 1,450,734 8/8
4 2024-06 2024-06-30 2,564,073 1,450,959 8/8
5 2024-07 2024-07-31 2,536,847 1,446,407 8/8
6 2024-08 2024-08-31 2,525,400 1,445,209 8/8
7 2024-09 2024-09-30 2,515,011 1,447,543 8/8
8 2024-11 2024-11-01 2,498,171 1,448,428 8/8 Oct 2024 not published
9 2024-12 2024-12-01 2,487,961 1,451,534 8/8
10 2025-01 2025-01-01 2,474,591 1,461,522 8/8
11 2025-02 2025-02-01 2,444,991 1,469,076 8/8
12 2025-03 2025-03-01 2,429,232 1,482,613 8/8
13 2025-04 2025-03-31 2,414,672 1,486,712 8/8
14 2025-05 2025-04-30 2,397,590 1,486,669 8/8
15 2025-06 2025-05-31 2,388,556 1,487,646 8/8
16 2025-07 2025-06-30 2,381,293 1,488,183 8/8
17 2025-08 2025-07-31 2,364,683 1,484,063 8/8
18 2025-09 2025-08-31 2,354,148 1,482,675 8/8
19 2025-11 2025-11-01 2,331,899 1/8 Partial: ps-tarifs only
20 2026-01 2026-02-01 2,291,052 1,510,240 8/8 Final snapshot (deprecated)

Slug convention: directory names use the upload-date month (YYYY-MM). For end-of-month uploads (e.g. uploaded March 31), the slug uses that month (e.g. 2025-03), not the next month. Exception: when two uploads fall in the same calendar month, the second gets the next month's slug (e.g. 2025-03 for Mar 1 upload and 2025-04 for Mar 31 upload).

Lost snapshots

These were purged from the CDN before we could capture them:

  • Oct 2024: never published
  • Oct-Dec 2025: CDN purged (only Nov 2025 ps-tarifs salvaged locally)
  • Pre-2024: 6 Wayback Machine references (May 2021 — Aug 2023), all 404 on CDN

3. Data schema

All CSVs are semicolon-separated (;), have no header row, and use latin-1 encoding.

Full column documentation with code tables is in data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md.

File summary

File Description Rows (latest) Key columns
ps-tarifs.csv Practitioner tariffs per act 2,291,052 nom, prenom, code_postal, profession, code_acte, montant_*
ps-infospratiques.csv Practitioner schedules 1,510,240 nom, prenom, code_postal, profession, jour, heure_debut/fin
baseremboursement.csv CCAM reimbursement bases 400,174 code_acte_ccam, profession, convention, borne_inf/sup
etb-tarifs.csv Facility hospitalization tariffs 208,267 nom, code_postal, specialite, cout_global, remboursement_cnam
psdansetablissements.csv Practitioners in facilities 22,519 nom_etablissement, nom_ps, profession_ps
etb-casdentaire.csv Dental access centers 325 nom, code_postal, profession
etb-prado.csv PRADO facilities 2,861 nom, code_postal, thematique_prado
etb-speexternes.csv Health center specialties 3,954 nom, code_postal, profession

Key code tables

Professions (used across all PS files): integer codes 1-74. Key dental codes: - 18 = Chirurgien-dentiste - 19 = Chirurgien-dentiste specialiste en orthopedie dento-faciale - 20 = Chirurgien-dentiste specialiste en chirurgie orale - 21 = Chirurgien-dentiste specialiste en medecine bucco-dentaire

Convention: nc (non conventionne), c1 (secteur 1), c2 (secteur 1 avec depassement), c3 (secteur 2)

Nature d'exercice: 1-8 (1=inactive, 3=liberal integral, etc.)

No unique practitioner ID

The CSVs contain no RPPS number or stable unique identifier. Practitioners are identified by (nom, prenom, adresse, profession) which is fragile across snapshots (address changes, typos). This is a fundamental limitation for row-level tracking.


4. Data stability analysis

Analysis performed April 2026 across all 19 consecutive monthly snapshots.

ps-tarifs.csv — month-to-month diff (sorted line comparison)

Transition Rows A Rows B Identical Changed lines Change rate
2024-03 → 04 2,602,986 2,585,635 2,560,830 66,961 1.2%
2024-04 → 05 2,585,635 2,574,636 2,559,102 42,067 0.8%
2024-05 → 06 2,574,636 2,564,073 2,542,261 54,187 1.0%
2024-06 → 07 2,564,073 2,536,847 2,513,171 74,578 1.4%
2024-07 → 08 2,536,847 2,525,400 2,511,605 39,037 0.7%
2024-08 → 09 2,525,400 2,515,011 2,490,261 59,889 1.1%
2024-09 → 11 2,515,011 2,498,171 2,468,642 75,898 1.5%
2024-11 → 12 2,498,171 2,487,961 2,471,128 43,876 0.8%
2024-12 → 25-01 2,487,961 2,474,591 2,446,440 69,672 1.4%
2025-01 → 02 2,474,591 2,444,991 2,421,638 76,306 1.5%
2025-02 → 03 2,444,991 2,429,232 2,405,017 64,189 1.3%
2025-03 → 04 2,429,232 2,414,672 2,392,574 58,756 1.2%
2025-04 → 05 2,414,672 2,397,590 2,373,617 65,028 1.3%
2025-05 → 06 2,397,590 2,388,556 2,368,870 48,406 1.0%
2025-06 → 07 2,388,556 2,381,293 2,360,678 48,493 1.0%
2025-07 → 08 2,381,293 2,364,683 2,333,975 78,026 1.6%
2025-08 → 09 2,364,683 2,354,148 2,322,641 73,549 1.5%

Average monthly change: 1.2% — 98.8% of rows are identical between consecutive months.

ps-infospratiques.csv — month-to-month diff

Transition Identical Changed lines Change rate
2024-03 → 04 1,418,169 65,613 2.2%
2024-04 → 05 1,431,408 38,387 1.3%
2024-05 → 06 1,428,251 45,191 1.5%
2024-06 → 07 1,418,165 61,036 2.1%
2024-07 → 08 1,427,718 36,180 1.2%
2024-08 → 09 1,420,201 52,350 1.8%
2024-09 → 11 1,412,090 71,791 2.4%
2024-11 → 12 1,426,284 47,394 1.6%
2024-12 → 25-01 1,414,883 83,290 2.8%
2025-01 → 02 1,416,865 96,868 3.3%
2025-02 → 03 1,432,698 86,293 2.9%
2025-03 → 04 1,451,063 67,199 2.2%
2025-04 → 05 1,454,247 64,887 2.1%
2025-05 → 06 1,443,061 88,730 2.9%
2025-06 → 07 1,465,983 43,863 1.4%
2025-07 → 08 1,453,813 64,620 2.1%
2025-08 → 09 1,452,627 61,484 2.0%

Average monthly change: 2.1% — 97.9% of rows are identical.

Static tables (zero content changes across all 19 snapshots)

These four tables have identical content in every snapshot (only row ordering changes):

File Rows Status
baseremboursement.csv 400,174 Identical across all 19 snapshots
etb-tarifs.csv 208,267 Identical (2 rows changed once in Jun 2024)
etb-casdentaire.csv 325 Identical
etb-prado.csv 2,861 Identical
etb-speexternes.csv 3,954 Identical

Overall data trend

The ps-tarifs row count declines steadily from 2,602,986 (Mar 2024) to 2,291,052 (Jan 2026) — a 12% decline over 22 months, roughly 14,000 fewer rows per month. This reflects practitioners leaving the conventioned system.

The ps-infospratiques row count is more stable (~1.45M → ~1.51M), with slight growth.


5. Storage strategy

Decision: PostgreSQL (latest snapshot) + DuckDB (historical analysis)

Layer What Rows Use case
PostgreSQL Latest snapshot (2026-01) 4.4M Django ORM, admin, JOINs with other tables
DuckDB All 20 gzipped archives ~45M on-demand Cross-snapshot trends, historical analysis
Gzipped CSV Source of truth on disk 955 MB total, no import needed for DuckDB

Why not load all snapshots into PostgreSQL? - 20 snapshots × 4.5M = ~90M rows for a read-only archive - 98-99% of rows are identical between consecutive months (only ~1-2% changes) - 4 of 8 tables have zero content changes across all snapshots - The dataset is deprecated — no new snapshots coming, no ongoing pipeline - DuckDB reads the gzipped archives directly without any import step

Compression

All CSV files are stored gzipped. Compression ratios:

File Raw Gzipped Ratio
ps-tarifs.csv 348 MB 22 MB 6.4%
ps-infospratiques.csv 179 MB 22 MB 12.3%
baseremboursement.csv 13 MB 1 MB 7.2%
etb-tarifs.csv 31 MB 4 MB 11.9%
Small files 3 MB 0.4 MB ~13%
Per snapshot 548 MB 47 MB 8.6%
All 20 snapshots ~11 GB ~900 MB 8.6%

6. DuckDB access

Installation

pip install duckdb   # Python
brew install duckdb  # CLI

Column definitions

DuckDB needs explicit column names since the CSVs have no headers. The canonical column definitions are in apps/nomos_cnam/duckdb_helper.py (the COLUMNS dict). Key column names:

ps_tarifs: civilite, nom, prenom, adresse1-4, code_postal, commune, telephone, profession, mode_exercice, nature_exercice, convention, option_cas, sesam_vitale, code_acte, famille_acte, montant_principal, borne_inf_principal, borne_sup_principal, montant_2nd, ...montant_cec (33 cols)

ps_infos: same identity/address columns + type_activite, type_consultation, heure_debut, heure_fin, jour (21 cols)

base_remboursement: code_acte_ccam, activite, profession, convention, option_cas, type_affichage, borne_inf_base_remb, borne_sup_base_remb (8 cols)

etb_tarifs: nom, adresse1-3, code_postal, commune, telephone, type_etablissement, specialite, nb_hospitalisations, nb_hospitalisations_2, indicateur_nb_hosp, nb_moyen_nuitees, indicateur_nuitees, cout_global, remboursement_cnam, reste_charge_hosp, reste_charge_depassements (18 cols, financial columns are text descriptions like "de 1860 a 4390 euros.")

ps_etablissements: nom_etablissement, adresse1-3, code_postal, commune, telephone, type_etablissement, nom_ps, prenom_ps, profession_ps (11 cols)

Python helper

The module apps/nomos_cnam/duckdb_helper.py provides a ready-to-use query interface.

from apps.nomos_cnam.duckdb_helper import cnam_query, cnam_snapshot, available_snapshots

# List available snapshots
available_snapshots()
# [('2024-03', 8), ('2024-04', 8), ..., ('2026-01', 8)]

# Query latest snapshot (returns pandas DataFrame)
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18 AND code_postal LIKE '75%'")

# Query specific snapshot
df = cnam_query("SELECT * FROM ps_tarifs WHERE profession = 18", snapshot='2024-05')

# Query across all snapshots (adds snapshot_date column automatically)
df = cnam_query(
    "SELECT snapshot_date, count(*) as n FROM ps_tarifs GROUP BY 1 ORDER BY 1",
    snapshot='all'
)

# Interactive exploration (returns DuckDB connection with views registered)
con = cnam_snapshot('2025-09')
con.sql("SELECT profession, count(*) FROM ps_tarifs GROUP BY 1 ORDER BY 2 DESC")

Available views: ps_tarifs, ps_infos, base_remboursement, etb_tarifs, ps_etablissements, etb_cas_dentaire, etb_prado, etb_spe_externes


7. Example queries

Dentists in Paris (latest snapshot)

df = cnam_query("""
    SELECT nom, prenom, code_postal, commune, convention,
           code_acte, montant_principal
    FROM ps_tarifs
    WHERE profession = 18
      AND code_postal LIKE '75%'
    ORDER BY nom, prenom
""")

Dentist count by departement over time

df = cnam_query("""
    SELECT snapshot_date,
           SUBSTRING(code_postal, 1, 2) AS dept,
           COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
    FROM ps_tarifs
    WHERE profession = 18
    GROUP BY 1, 2
    ORDER BY 1, 2
""", snapshot='all')

Average tariff for a specific CCAM act

df = cnam_query("""
    SELECT snapshot_date,
           AVG(montant_principal) AS avg_tarif,
           PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY montant_principal) AS median_tarif,
           COUNT(*) AS n_practitioners
    FROM ps_tarifs
    WHERE code_acte = 'HBMD038'
      AND montant_principal > 0
    GROUP BY 1
    ORDER BY 1
""", snapshot='all')

Convention distribution for dentists

df = cnam_query("""
    SELECT convention,
           COUNT(DISTINCT nom || '|' || prenom || '|' || code_postal) AS n_dentists
    FROM ps_tarifs
    WHERE profession = 18
    GROUP BY 1
    ORDER BY 2 DESC
""")

CCAM reimbursement base lookup

df = cnam_query("""
    SELECT code_acte_ccam, profession, convention,
           borne_inf_base_remb, borne_sup_base_remb
    FROM base_remboursement
    WHERE code_acte_ccam = 'HBMD038'
    ORDER BY profession, convention
""")

Facility tariffs for dental specialties

df = cnam_query("""
    SELECT nom, commune, code_postal,
           nb_hospitalisations, cout_global, remboursement_cnam,
           reste_charge_hosp
    FROM etb_tarifs
    WHERE specialite IN (56, 57)  -- dental specialties
    ORDER BY nb_hospitalisations DESC
    LIMIT 20
""")

Cross-snapshot delta report (what changed between two months)

from apps.nomos_cnam.duckdb_helper import cnam_query

# Practitioners that were in Aug 2025 but not Sep 2025 (departures)
departures = cnam_query("""
    SELECT DISTINCT a.nom, a.prenom, a.code_postal, a.profession
    FROM ps_tarifs a
    WHERE a.snapshot_date = '2025-08'
      AND NOT EXISTS (
          SELECT 1 FROM ps_tarifs b
          WHERE b.snapshot_date = '2025-09'
            AND a.nom = b.nom AND a.prenom = b.prenom
            AND a.code_postal = b.code_postal AND a.profession = b.profession
      )
""", snapshot='all')

DuckDB CLI one-liner

# Total rows across all snapshots
duckdb -c "
  SELECT filename, COUNT(*) as rows
  FROM read_csv('data/archives/annuaire-sante-cnam-*/csv/ps-tarifs.csv.gz',
                delim=';', header=false, filename=true)
  GROUP BY 1 ORDER BY 1
"

8. Relationship to the Django app

The apps/nomos_cnam/ Django app was originally designed to import all snapshots into PostgreSQL tables using COPY. Given the analysis above, this approach has been superseded by DuckDB for archive queries.

The Django app components: - duckdb_helper.py — DuckDB query interface for cross-snapshot analysis (cnam_query, cnam_snapshot) - models/ — Django ORM models for all 8 tables + CnamImportLog - management/commands/cnam_import.py — PostgreSQL importer (decompresses .csv.gz on-the-fly, streams via COPY) - admin.py — Admin registration for CnamImportLog

Two query paths

PostgreSQL (Django ORM) — latest snapshot loaded, for use in Django views, ORM joins, admin:

from apps.nomos_cnam.models import PsTarif

PsTarif.objects.filter(profession=18, code_postal__startswith='75').count()

DuckDB (cross-snapshot analysis) — reads all 20 gzipped archives directly, no import needed:

from apps.nomos_cnam.duckdb_helper import cnam_query

df = cnam_query("SELECT snapshot_date, count(*) FROM ps_tarifs GROUP BY 1", snapshot='all')

Importing into PostgreSQL

The latest snapshot (2026-01) is loaded into PostgreSQL (4.4M rows, ~3 min). To reload or import a different snapshot:

# Import latest (decompresses .csv.gz on-the-fly)
make shell
python manage.py cnam_import --snapshot 2026-01

# Dry-run to see row counts without writing
python manage.py cnam_import --snapshot 2026-01 --dry-run

# Import specific tables only
python manage.py cnam_import --snapshot 2026-01 --tables ps-tarifs.csv baseremboursement.csv

The command automatically deletes existing rows for that snapshot_date before inserting, so re-running is idempotent. To switch to a different snapshot, just run with a different --snapshot value — old data for the previous snapshot remains unless you truncate manually.


9. Files reference

Path Purpose
data/archives/MANIFEST.md Master inventory, CDN URLs, download status
data/archives/annuaire-sante-cnam-2026-01/SCHEMA.md Complete column schema with code tables
data/archives/annuaire-sante-cnam-2026-01/README.md Dataset provenance and file layout
docs/NOMOS_CNAM_ARCHIVE.md This document
apps/nomos_cnam/ Django app (models, legacy importer)
scripts/cnam_download_missing.sh Download script for CDN snapshots