Aller au contenu

Server Separation Plan — Prod vs Dev/Staging

Date: 2026-04-10 Status: Draft Decision: Separate production from dev/staging onto two servers


1. Why Separate

Current state: 1 server, 30 containers

Resource Value Concern
CPU 6 vCPUs Shared across all envs
RAM 12 GB (6.5 GB used, 1.3 GB swap) 55% used at rest — spikes during celery-heavy tasks
Disk 100 GB (73% used) Media + backups growing
Containers 30 All 3 envs + shared services + monitoring

Risks of the current single-server setup

  1. Blast radius — a runaway dev/staging celery task can trigger OOM kills that affect prod. OOM scores mitigate this but don't eliminate it: it's still one kernel making triage decisions under pressure.
  2. Shared database — a bad migration on staging can lock tables prod is reading. A DROP COLUMN on dev with a long lock wait blocks prod queries on that table.
  3. Resource contention — staging celery-heavy currently uses 754 MB. During RPPS imports, it can hit its 3 GB limit. That's memory prod can't use.
  4. Deployment risk — a misconfigured nginx reload, a wrong docker compose project name, or a network conflict during dev deployment can take prod down.
  5. Maintenance windows — upgrading Postgres, Redis, or Docker itself requires downtime for all environments simultaneously.

What triggers the move

Any one of these justifies it: - A staging incident has affected (or nearly affected) prod - RAM usage regularly exceeds 80% (currently at ~55%, but celery-heavy spikes can push it higher) - You want to do risky operations (Postgres major upgrade, kernel update) without prod risk - Disk is approaching capacity (currently 73%)


2. Target Architecture

VPS #1 — Production (existing server: 54.36.99.184)

VPS #1 — PROD (54.36.99.184)
├── nginx (ports 80/443)
│   ├── aletheia.groupe-suffren.com          → aletheia-prod-web:8000
│   ├── cabinet-dentaire-aubagne.fr          → helios-prod-web:3000
│   ├── le-canet.chirurgiens-dentistes.fr    → helios-prod-web:3000
│   ├── (other practice domains)             → helios-prod-web:3000
│   ├── cda.groupe-suffren.com               → helios-prod-web:3000
│   ├── monitoring.groupe-suffren.com        → grafana:3000
│   └── analytics.groupe-suffren.com         → umami:3000
├── Aletheia prod
│   ├── aletheia-prod-web (Gunicorn)
│   ├── aletheia-prod-celery
│   ├── aletheia-prod-celery-heavy
│   └── aletheia-prod-beat
├── Helios prod
│   └── helios-prod-web (Next.js)
├── Shared services (prod only)
│   ├── PostgreSQL 18 (aletheia_prod, umami)
│   └── Redis 7 (DB 0-1 only)
├── Monitoring (Prometheus, Grafana, Loki, Alloy, exporters)
├── Umami analytics
└── Backups (prod DB + media only)

Containers: ~17 (down from 30)

VPS #2 — Dev/Staging (new server)

VPS #2 — DEV/STAGING (new OVH VPS)
├── nginx (ports 80/443)
│   ├── aletheia-staging.groupe-suffren.com  → aletheia-staging-web:8000
│   ├── aletheia-dev.groupe-suffren.com      → aletheia-dev-web:8000
│   ├── cda-staging.groupe-suffren.com       → helios-staging-web:3000
│   ├── cda-dev.groupe-suffren.com           → helios-dev-web:3000
│   ├── vsm-staging.groupe-suffren.com       → helios-staging-web:3000
│   ├── vsm-dev.groupe-suffren.com           → helios-dev-web:3000
│   └── (other practice dev/staging domains)
├── Aletheia staging
│   ├── aletheia-staging-web
│   ├── aletheia-staging-celery
│   ├── aletheia-staging-celery-heavy
│   └── aletheia-staging-beat
├── Aletheia dev
│   ├── aletheia-dev-web
│   ├── aletheia-dev-celery
│   ├── aletheia-dev-celery-heavy
│   └── aletheia-dev-beat
├── Helios staging + dev
│   ├── helios-staging-web
│   └── helios-dev-web
├── Shared services (dev/staging only)
│   ├── PostgreSQL 18 (aletheia_staging, aletheia_dev)
│   └── Redis 7 (DB 2-5)
└── Lightweight monitoring (node_exporter only, scraped by prod Prometheus)

Containers: ~17


3. What Changes

3.1 DNS

Move dev/staging DNS records to point to VPS #2:

Record Current After
aletheia-staging.groupe-suffren.com 54.36.99.184 VPS #2 IP
aletheia-dev.groupe-suffren.com 54.36.99.184 VPS #2 IP
cda-staging.groupe-suffren.com 54.36.99.184 VPS #2 IP
cda-dev.groupe-suffren.com 54.36.99.184 VPS #2 IP
vsm-staging.groupe-suffren.com 54.36.99.184 VPS #2 IP
vsm-dev.groupe-suffren.com 54.36.99.184 VPS #2 IP
(all other *-staging / *-dev subdomains) 54.36.99.184 VPS #2 IP

Prod DNS records stay unchanged.

3.2 Database Separation

Each server gets its own PostgreSQL instance. This is the key benefit — no cross-environment table locking.

VPS #1 Postgres (prod): - aletheia_prod database + aletheia_prod user - umami database + umami user

VPS #2 Postgres (dev/staging): - aletheia_staging database + aletheia_staging user - aletheia_dev database + aletheia_dev user

The init SQL script splits into two files or is parameterized per server.

3.3 Redis Separation

VPS #1 Redis: DB 0-1 (prod broker + cache) VPS #2 Redis: DB 2-5 (staging + dev broker + cache), or renumber to 0-3 on VPS #2

3.4 Nginx

Each server runs its own nginx instance with only its environment's vhosts.

VPS #1 nginx configs: - aletheia-prod.conf.full / .temp - helios-prod.conf.full / .temp - monitoring.conf.full / .temp - analytics.conf.full / .temp

VPS #2 nginx configs: - aletheia-staging.conf.full / .temp - aletheia-dev.conf.full / .temp - helios-staging.conf.full / .temp - helios-dev.conf.full / .temp

Remove the .htpasswd basic auth requirement from VPS #1 (prod doesn't use it). VPS #2 keeps it for staging/dev.

3.5 SSL Certificates

Each server obtains its own certificates via certbot for its domains. Prod certs stay on VPS #1. Staging/dev certs on VPS #2.

If using Cloudflare for prod domains, that's unaffected. Dev/staging subdomains under groupe-suffren.com can use Let's Encrypt on VPS #2.

3.6 Monitoring

Option A — Single Prometheus on VPS #1 (recommended to start): - VPS #1 Prometheus scrapes local exporters as before - VPS #1 Prometheus also scrapes VPS #2's node_exporter over the network (port 9100, firewalled to VPS #1 IP only) - Grafana dashboards cover both servers - Loki on VPS #1, Alloy on both servers pushing logs to VPS #1 Loki

Option B — Full monitoring on both (later if needed): - Duplicate the monitoring stack on VPS #2 - Use Grafana on VPS #1 with Prometheus/Loki on VPS #2 as remote data sources

3.7 Backups

VPS #1: Backs up aletheia_prod + umami + prod media. Same cron, same retention.

VPS #2: Backs up aletheia_staging only (dev is disposable). Lighter retention (3-day daily, no weekly). Or no backups at all — staging data can be re-seeded from a prod dump.

3.8 Aether Repo Structure

The Makefile and config files need to become server-aware. Two approaches:

Approach A — Server variable in Makefile (simpler):

make deploy SERVER=prod    # deploys prod configs to VPS #1
make deploy SERVER=devstg  # deploys dev/staging configs to VPS #2

The INFRA_FILES list splits into PROD_FILES and DEVSTG_FILES. Diff/deploy/pull targets accept a SERVER parameter.

Approach B — Two branches or directories (cleaner long-term):

aether/repo/
├── prod/                   # configs for VPS #1
│   ├── nginx/conf.d/       # only prod vhosts
│   ├── shared/             # prod postgres/redis compose
│   └── monitoring/         # full monitoring stack
├── devstg/                 # configs for VPS #2
│   ├── nginx/conf.d/       # only dev/staging vhosts
│   ├── shared/             # dev/staging postgres/redis compose
│   └── monitoring/         # lightweight (node_exporter + alloy)
├── common/                 # shared configs (nginx.conf base, security/)
├── Makefile
└── setup.sh                # accepts --server=prod|devstg

3.9 Secret Management

SOPS encryption stays the same — same age key on both servers. The SECRET_FILES list splits:

VPS #1 secrets: shared/.env (prod), envs/aletheia/.env.prod, envs/helios/.env.prod, monitoring/.env, umami/.env, nginx/.htpasswd VPS #2 secrets: shared/.env (devstg), envs/aletheia/.env.staging, envs/aletheia/.env.dev, envs/helios/.env.staging, envs/helios/.env.dev, nginx/.htpasswd

3.10 Deploy Scripts

deploy.sh in aletheia and helios repos need a server mapping: - deploy.sh prod → runs on VPS #1 - deploy.sh staging / deploy.sh dev → runs on VPS #2

Since deploy scripts run locally on the server (SSH in, then run), this is mostly about knowing which server to SSH into. No code change needed in the scripts themselves — they already parameterize by environment.

3.11 Helios → Aletheia API

Currently Helios calls Aletheia over the Docker backend network: http://aletheia-{env}-web:8000.

After separation, nothing changes — each Helios environment is on the same server as its matching Aletheia environment. The Docker network names and service names stay the same.

This is a major advantage of prod/devstg split over the Helios-on-its-own-server option: no cross-server API calls, no latency penalty, no vRack needed.

3.12 ISR Webhook (Aletheia → Helios)

Same — stays on the Docker web network within each server. http://helios-{env}-web:3000/api/revalidate works unchanged.


4. VPS #2 Sizing

Dev/staging is lower traffic than prod. A smaller VPS suffices:

Resource Recommendation Rationale
CPU 4 vCPUs Enough for 2 envs of Aletheia + Helios
RAM 8 GB Dev + staging Aletheia + celery-heavy (3G) + headroom
Disk 40-60 GB No prod media, no long-term backups
Location OVH France Same region as VPS #1, GDPR compliant

Estimated cost: ~10-15 EUR/month for an OVH VPS at this spec.


5. Migration Procedure

Phase 1 — Provision VPS #2

  1. Order OVH VPS (Debian 12+, France region)
  2. Run the setup script (adapted for dev/staging):
    sudo ./setup.sh --server=devstg
    
    This sets up: Docker, security hardening, directory structure, Docker networks, Postgres (with staging + dev databases), Redis, nginx (with dev/staging vhosts), node_exporter
  3. Place the age key at /opt/docker/.age-key.txt
  4. Decrypt dev/staging secrets: make decrypt SERVER=devstg

Phase 2 — Seed Data

  1. Dump staging database from VPS #1:
    # On VPS #1
    docker exec shared_postgres pg_dump -U aletheia_staging -Fc aletheia_staging > staging.dump
    
  2. Transfer and restore on VPS #2:
    scp -P 57361 staging.dump debian@vps2:/tmp/
    # On VPS #2
    docker cp /tmp/staging.dump shared_postgres:/tmp/
    docker exec shared_postgres pg_restore -U aletheia_staging -d aletheia_staging -Fc --no-owner /tmp/staging.dump
    
  3. Repeat for dev database (or start fresh)
  4. Copy staging media if needed:
    rsync -avz -e 'ssh -p 57361' /opt/docker/aletheia/media/staging/ debian@vps2:/opt/docker/aletheia/media/staging/
    

Phase 3 — Deploy Apps on VPS #2

  1. Clone aletheia and helios repos to VPS #2
  2. Deploy staging and dev:
    cd /opt/docker/aletheia/repo && ./deploy.sh staging
    cd /opt/docker/aletheia/repo && ./deploy.sh dev
    cd /opt/docker/helios/repo && ./deploy.sh staging
    cd /opt/docker/helios/repo && ./deploy.sh dev
    
  3. Verify apps are running and accessible via VPS #2 IP

Phase 4 — DNS Cutover

  1. Update staging/dev DNS records to VPS #2 IP
  2. Wait for propagation (~5 min with low TTL, set TTL low beforehand)
  3. Obtain SSL certificates on VPS #2:
    docker exec certbot certbot certonly --webroot -w /var/www/certbot \
      -d aletheia-staging.groupe-suffren.com \
      -d aletheia-dev.groupe-suffren.com \
      -d cda-staging.groupe-suffren.com \
      -d cda-dev.groupe-suffren.com
    
  4. Activate full nginx configs (HTTPS) on VPS #2

Phase 5 — Clean Up VPS #1

  1. Stop dev/staging containers on VPS #1:
    ENV_FILE=/opt/docker/aletheia/envs/.env.staging ENVIRONMENT=staging \
      sudo -E docker compose -p aletheia-staging down
    ENV_FILE=/opt/docker/aletheia/envs/.env.dev ENVIRONMENT=dev \
      sudo -E docker compose -p aletheia-dev down
    # Same for helios-staging and helios-dev
    
  2. Remove dev/staging nginx vhosts from VPS #1
  3. Remove dev/staging databases from VPS #1 Postgres:
    DROP DATABASE aletheia_staging;
    DROP DATABASE aletheia_dev;
    DROP USER aletheia_staging;
    DROP USER aletheia_dev;
    
  4. Remove dev/staging env files and media from VPS #1
  5. Reload nginx on VPS #1
  6. Update VPS #1 monitoring to remove dev/staging targets

Phase 6 — Adjust Resource Limits on VPS #1

With only prod running, prod containers can have more generous limits:

Container Before After
aletheia-prod-web 2 GB 3 GB
aletheia-prod-celery 2 GB 2 GB (unchanged)
aletheia-prod-celery-heavy 3 GB 4 GB
helios-prod-web 1 GB 1.5 GB

OOM scores become less critical since there's no dev/staging to compete, but keep them for safety.


6. Rollback Plan

If something goes wrong during migration: 1. DNS records can be flipped back to VPS #1 (dev/staging containers are still running until Phase 5) 2. VPS #1 retains all data until Phase 5 cleanup 3. Phase 5 is the point of no return — don't execute it until VPS #2 has been stable for at least a week


7. What This Does NOT Change

  • Local development — still the same: both apps on your Mac, localhost
  • Git workflow — same branches, same repos, same CI
  • Deploy scripts — same deploy.sh per app, just run on the right server
  • Helios ↔ Aletheia communication — stays on Docker network, no cross-server calls
  • SOPS/age — same key, same workflow
  • Prod domains — no DNS changes for production

8. Future Considerations

If Helios needs its own server later

This split makes a future Helios separation easier: you'd move helios-prod-web from VPS #1 to a VPS #3, and connect it back to Aletheia via vRack. But with 9 dental practice sites, this is unlikely to be needed soon.

If you want managed Postgres

Moving prod Postgres off VPS #1 to a managed service (e.g., OVH Cloud Databases) gives you automatic backups, point-in-time recovery, and failover. But it adds network latency to every query and costs significantly more. Only consider this if database reliability becomes a concern.

Monitoring consolidation

Once VPS #2 is stable, consider whether you want centralized monitoring (Prometheus on VPS #1 scraping both) or independent stacks. Centralized is simpler but means a VPS #1 outage blinds you to VPS #2 issues too.