Aller au contenu

AbstractImporter usage

Status: Placeholder — to be developed. Last reviewed:Reference structural sibling: guidelines/ui/forms.md (component-style sectioning + length).

Scope (when this guideline lands)

Conventions for adding a new importer: subclassing AbstractImporter, declaring ImportRow schema, bulk-mode rules, error handling, idempotency, file-format expectations, how the importer is wired into apps/collection/ (remote agent uploads) and the admin.

Out of scope (cross-refs)

  • Logosw-specific format quirks (encoding, category mapping, CIVIL.txt vs ACTES_2.txt structure) → guidelines/integrations/logosw.md (placeholder).
  • Collection / remote agent upload pipeline (the file-arrival side: heartbeat, batching, retry, validation) → not yet a guideline; agent backlog has many active items.
  • Form construction for upload UIguidelines/ui/forms.md and guidelines/backend/forms.md (placeholder).
  • Celery task shape for the import task → guidelines/celery/task-conventions.md (placeholder).

Sources to mine when writing this

  • apps/imports/AbstractImporter definition (importers/__init__.py or importers/base.py).
  • Existing importers: CIVIL, ACTES_2, GL trial balance (most recent: apps/imports/importers/gl_trial_balance.py).
  • apps/imports/models.pyImport and ImportRow models.
  • docs/IMPORTS_MODULE.md — existing module documentation (in French); extract the rules, leave the reference material in docs/.
  • roadmap/done/imports-bulk-performance.md — bulk-mode lessons (OOM avoidance, debug_mode flag).
  • roadmap/done/imports-encoding-fixes.md — encoding handling.
  • roadmap/done/imports-gl-payment-redesign.md — recent 5-phase migration; reference for evolving an importer.
  • roadmap/done/imports-logging-optimization.md — logging discipline.

Starter hard rules to investigate

  1. Subclass AbstractImporter, declare ImportRow schema in __init__, implement _import_row(row) — never bypass the base class.
  2. Idempotent re-run: re-importing the same file produces the same DB state. Use natural keys + update_or_create.
  3. Every row gets an ImportRow: status 'imported', 'skipped', 'failed' — never silent.
  4. Bulk mode for >1000 rows: bulk_create with ignore_conflicts=True, batch size from settings.
  5. Errors logged WITHOUT PII (per guidelines/security/pii-and-logging.md) — use the row's external ID, not patient name.
  6. debug_mode=True disables bulk and adds verbose logging for diagnosing single-row failures.

Decision points to settle

  1. Error policy: skip-and-log vs fail-fast on first error vs configurable per-importer?
  2. Batch size: per-importer override vs global setting?
  3. Progress reporting: how to expose import progress to the UI (HTMX poll? AJAX progress per the imports-ajax-progress Idea?)
  4. Pre-validation step: should importers validate file structure BEFORE processing rows (fail-fast on bad header), or row-by-row?
  5. Re-import without re-uploading: management command to replay an existing Import record's file? Useful for fixing importer bugs against historical data.

Known deviations to look for during writing

  • Importers that catch Exception: pass and don't record an ImportRow.
  • Importers that use for row in file: model.save() without bulk mode (slow on large files).
  • Importer logs containing patient names or other PII.
  • New importers that don't subclass AbstractImporter (custom base — investigate why).

If found, file as roadmap/backlog/imports-importer-drift-2026-MM.md.