Server Separation Plan — Prod vs Dev/Staging¶

Date: 2026-04-10 Status: Draft Decision: Separate production from dev/staging onto two servers

1. Why Separate¶

Current state: 1 server, 30 containers¶

Resource	Value	Concern
CPU	6 vCPUs	Shared across all envs
RAM	12 GB (6.5 GB used, 1.3 GB swap)	55% used at rest — spikes during celery-heavy tasks
Disk	100 GB (73% used)	Media + backups growing
Containers	30	All 3 envs + shared services + monitoring

Risks of the current single-server setup¶

Blast radius — a runaway dev/staging celery task can trigger OOM kills that affect prod. OOM scores mitigate this but don't eliminate it: it's still one kernel making triage decisions under pressure.
Shared database — a bad migration on staging can lock tables prod is reading. A DROP COLUMN on dev with a long lock wait blocks prod queries on that table.
Resource contention — staging celery-heavy currently uses 754 MB. During RPPS imports, it can hit its 3 GB limit. That's memory prod can't use.
Deployment risk — a misconfigured nginx reload, a wrong docker compose project name, or a network conflict during dev deployment can take prod down.
Maintenance windows — upgrading Postgres, Redis, or Docker itself requires downtime for all environments simultaneously.

What triggers the move¶

Any one of these justifies it: - A staging incident has affected (or nearly affected) prod - RAM usage regularly exceeds 80% (currently at ~55%, but celery-heavy spikes can push it higher) - You want to do risky operations (Postgres major upgrade, kernel update) without prod risk - Disk is approaching capacity (currently 73%)

2. Target Architecture¶

VPS #1 — Production (existing server: 54.36.99.184)¶

VPS #1 — PROD (54.36.99.184)
├── nginx (ports 80/443)
│   ├── aletheia.groupe-suffren.com          → aletheia-prod-web:8000
│   ├── cabinet-dentaire-aubagne.fr          → helios-prod-web:3000
│   ├── le-canet.chirurgiens-dentistes.fr    → helios-prod-web:3000
│   ├── (other practice domains)             → helios-prod-web:3000
│   ├── cda.groupe-suffren.com               → helios-prod-web:3000
│   ├── monitoring.groupe-suffren.com        → grafana:3000
│   └── analytics.groupe-suffren.com         → umami:3000
│
├── Aletheia prod
│   ├── aletheia-prod-web (Gunicorn)
│   ├── aletheia-prod-celery
│   ├── aletheia-prod-celery-heavy
│   └── aletheia-prod-beat
│
├── Helios prod
│   └── helios-prod-web (Next.js)
│
├── Shared services (prod only)
│   ├── PostgreSQL 18 (aletheia_prod, umami)
│   └── Redis 7 (DB 0-1 only)
│
├── Monitoring (Prometheus, Grafana, Loki, Alloy, exporters)
├── Umami analytics
└── Backups (prod DB + media only)

Containers: ~17 (down from 30)

VPS #2 — Dev/Staging (new server)¶

VPS #2 — DEV/STAGING (new OVH VPS)
├── nginx (ports 80/443)
│   ├── aletheia-staging.groupe-suffren.com  → aletheia-staging-web:8000
│   ├── aletheia-dev.groupe-suffren.com      → aletheia-dev-web:8000
│   ├── cda-staging.groupe-suffren.com       → helios-staging-web:3000
│   ├── cda-dev.groupe-suffren.com           → helios-dev-web:3000
│   ├── vsm-staging.groupe-suffren.com       → helios-staging-web:3000
│   ├── vsm-dev.groupe-suffren.com           → helios-dev-web:3000
│   └── (other practice dev/staging domains)
│
├── Aletheia staging
│   ├── aletheia-staging-web
│   ├── aletheia-staging-celery
│   ├── aletheia-staging-celery-heavy
│   └── aletheia-staging-beat
│
├── Aletheia dev
│   ├── aletheia-dev-web
│   ├── aletheia-dev-celery
│   ├── aletheia-dev-celery-heavy
│   └── aletheia-dev-beat
│
├── Helios staging + dev
│   ├── helios-staging-web
│   └── helios-dev-web
│
├── Shared services (dev/staging only)
│   ├── PostgreSQL 18 (aletheia_staging, aletheia_dev)
│   └── Redis 7 (DB 2-5)
│
└── Lightweight monitoring (node_exporter only, scraped by prod Prometheus)

Containers: ~17

3. What Changes¶

3.1 DNS¶

Move dev/staging DNS records to point to VPS #2:

Record	Current	After
`aletheia-staging.groupe-suffren.com`	54.36.99.184	VPS #2 IP
`aletheia-dev.groupe-suffren.com`	54.36.99.184	VPS #2 IP
`cda-staging.groupe-suffren.com`	54.36.99.184	VPS #2 IP
`cda-dev.groupe-suffren.com`	54.36.99.184	VPS #2 IP
`vsm-staging.groupe-suffren.com`	54.36.99.184	VPS #2 IP
`vsm-dev.groupe-suffren.com`	54.36.99.184	VPS #2 IP
(all other `-staging` / `-dev` subdomains)	54.36.99.184	VPS #2 IP

Prod DNS records stay unchanged.

3.2 Database Separation¶

Each server gets its own PostgreSQL instance. This is the key benefit — no cross-environment table locking.

VPS #1 Postgres (prod): - aletheia_prod database + aletheia_prod user - umami database + umami user

VPS #2 Postgres (dev/staging): - aletheia_staging database + aletheia_staging user - aletheia_dev database + aletheia_dev user

The init SQL script splits into two files or is parameterized per server.

3.3 Redis Separation¶

VPS #1 Redis: DB 0-1 (prod broker + cache) VPS #2 Redis: DB 2-5 (staging + dev broker + cache), or renumber to 0-3 on VPS #2

3.4 Nginx¶

Each server runs its own nginx instance with only its environment's vhosts.

VPS #1 nginx configs: - aletheia-prod.conf.full / .temp - helios-prod.conf.full / .temp - monitoring.conf.full / .temp - analytics.conf.full / .temp

VPS #2 nginx configs: - aletheia-staging.conf.full / .temp - aletheia-dev.conf.full / .temp - helios-staging.conf.full / .temp - helios-dev.conf.full / .temp

Remove the .htpasswd basic auth requirement from VPS #1 (prod doesn't use it). VPS #2 keeps it for staging/dev.

3.5 SSL Certificates¶

Each server obtains its own certificates via certbot for its domains. Prod certs stay on VPS #1. Staging/dev certs on VPS #2.

If using Cloudflare for prod domains, that's unaffected. Dev/staging subdomains under groupe-suffren.com can use Let's Encrypt on VPS #2.

3.6 Monitoring¶

Option A — Single Prometheus on VPS #1 (recommended to start): - VPS #1 Prometheus scrapes local exporters as before - VPS #1 Prometheus also scrapes VPS #2's node_exporter over the network (port 9100, firewalled to VPS #1 IP only) - Grafana dashboards cover both servers - Loki on VPS #1, Alloy on both servers pushing logs to VPS #1 Loki

Option B — Full monitoring on both (later if needed): - Duplicate the monitoring stack on VPS #2 - Use Grafana on VPS #1 with Prometheus/Loki on VPS #2 as remote data sources

3.7 Backups¶

VPS #1: Backs up aletheia_prod + umami + prod media. Same cron, same retention.

VPS #2: Backs up aletheia_staging only (dev is disposable). Lighter retention (3-day daily, no weekly). Or no backups at all — staging data can be re-seeded from a prod dump.

3.8 Aether Repo Structure¶

The Makefile and config files need to become server-aware. Two approaches:

Approach A — Server variable in Makefile (simpler):

make deploy SERVER=prod    # deploys prod configs to VPS #1
make deploy SERVER=devstg  # deploys dev/staging configs to VPS #2

The INFRA_FILES list splits into PROD_FILES and DEVSTG_FILES. Diff/deploy/pull targets accept a SERVER parameter.

Approach B — Two branches or directories (cleaner long-term):

aether/repo/
├── prod/                   # configs for VPS #1
│   ├── nginx/conf.d/       # only prod vhosts
│   ├── shared/             # prod postgres/redis compose
│   └── monitoring/         # full monitoring stack
├── devstg/                 # configs for VPS #2
│   ├── nginx/conf.d/       # only dev/staging vhosts
│   ├── shared/             # dev/staging postgres/redis compose
│   └── monitoring/         # lightweight (node_exporter + alloy)
├── common/                 # shared configs (nginx.conf base, security/)
├── Makefile
└── setup.sh                # accepts --server=prod|devstg

3.9 Secret Management¶

SOPS encryption stays the same — same age key on both servers. The SECRET_FILES list splits:

VPS #1 secrets: shared/.env (prod), envs/aletheia/.env.prod, envs/helios/.env.prod, monitoring/.env, umami/.env, nginx/.htpasswd VPS #2 secrets: shared/.env (devstg), envs/aletheia/.env.staging, envs/aletheia/.env.dev, envs/helios/.env.staging, envs/helios/.env.dev, nginx/.htpasswd

3.10 Deploy Scripts¶

deploy.sh in aletheia and helios repos need a server mapping: - deploy.sh prod → runs on VPS #1 - deploy.sh staging / deploy.sh dev → runs on VPS #2

Since deploy scripts run locally on the server (SSH in, then run), this is mostly about knowing which server to SSH into. No code change needed in the scripts themselves — they already parameterize by environment.

3.11 Helios → Aletheia API¶

Currently Helios calls Aletheia over the Docker backend network: http://aletheia-{env}-web:8000.

After separation, nothing changes — each Helios environment is on the same server as its matching Aletheia environment. The Docker network names and service names stay the same.

This is a major advantage of prod/devstg split over the Helios-on-its-own-server option: no cross-server API calls, no latency penalty, no vRack needed.

3.12 ISR Webhook (Aletheia → Helios)¶

Same — stays on the Docker web network within each server. http://helios-{env}-web:3000/api/revalidate works unchanged.

4. VPS #2 Sizing¶

Dev/staging is lower traffic than prod. A smaller VPS suffices:

Resource	Recommendation	Rationale
CPU	4 vCPUs	Enough for 2 envs of Aletheia + Helios
RAM	8 GB	Dev + staging Aletheia + celery-heavy (3G) + headroom
Disk	40-60 GB	No prod media, no long-term backups
Location	OVH France	Same region as VPS #1, GDPR compliant

Estimated cost: ~10-15 EUR/month for an OVH VPS at this spec.

5. Migration Procedure¶

Phase 1 — Provision VPS #2¶

Order OVH VPS (Debian 12+, France region)
Run the setup script (adapted for dev/staging):
```
sudo ./setup.sh --server=devstg
```
This sets up: Docker, security hardening, directory structure, Docker networks, Postgres (with staging + dev databases), Redis, nginx (with dev/staging vhosts), node_exporter
Place the age key at /opt/docker/.age-key.txt
Decrypt dev/staging secrets: make decrypt SERVER=devstg

Phase 2 — Seed Data¶

Dump staging database from VPS #1:

# On VPS #1
docker exec shared_postgres pg_dump -U aletheia_staging -Fc aletheia_staging > staging.dump

Transfer and restore on VPS #2:

scp -P 57361 staging.dump debian@vps2:/tmp/
# On VPS #2
docker cp /tmp/staging.dump shared_postgres:/tmp/
docker exec shared_postgres pg_restore -U aletheia_staging -d aletheia_staging -Fc --no-owner /tmp/staging.dump

Repeat for dev database (or start fresh)

Copy staging media if needed:

rsync -avz -e 'ssh -p 57361' /opt/docker/aletheia/media/staging/ debian@vps2:/opt/docker/aletheia/media/staging/

Phase 3 — Deploy Apps on VPS #2¶

Clone aletheia and helios repos to VPS #2

Deploy staging and dev:

cd /opt/docker/aletheia/repo && ./deploy.sh staging
cd /opt/docker/aletheia/repo && ./deploy.sh dev
cd /opt/docker/helios/repo && ./deploy.sh staging
cd /opt/docker/helios/repo && ./deploy.sh dev

Verify apps are running and accessible via VPS #2 IP

Phase 4 — DNS Cutover¶

Update staging/dev DNS records to VPS #2 IP
Wait for propagation (~5 min with low TTL, set TTL low beforehand)

Obtain SSL certificates on VPS #2:

docker exec certbot certbot certonly --webroot -w /var/www/certbot \
  -d aletheia-staging.groupe-suffren.com \
  -d aletheia-dev.groupe-suffren.com \
  -d cda-staging.groupe-suffren.com \
  -d cda-dev.groupe-suffren.com

Activate full nginx configs (HTTPS) on VPS #2

Phase 5 — Clean Up VPS #1¶

Stop dev/staging containers on VPS #1:

ENV_FILE=/opt/docker/aletheia/envs/.env.staging ENVIRONMENT=staging \
  sudo -E docker compose -p aletheia-staging down
ENV_FILE=/opt/docker/aletheia/envs/.env.dev ENVIRONMENT=dev \
  sudo -E docker compose -p aletheia-dev down
# Same for helios-staging and helios-dev

Remove dev/staging nginx vhosts from VPS #1

Remove dev/staging databases from VPS #1 Postgres:

DROP DATABASE aletheia_staging;
DROP DATABASE aletheia_dev;
DROP USER aletheia_staging;
DROP USER aletheia_dev;

Remove dev/staging env files and media from VPS #1
Reload nginx on VPS #1
Update VPS #1 monitoring to remove dev/staging targets

Phase 6 — Adjust Resource Limits on VPS #1¶

With only prod running, prod containers can have more generous limits:

Container	Before	After
aletheia-prod-web	2 GB	3 GB
aletheia-prod-celery	2 GB	2 GB (unchanged)
aletheia-prod-celery-heavy	3 GB	4 GB
helios-prod-web	1 GB	1.5 GB

OOM scores become less critical since there's no dev/staging to compete, but keep them for safety.

6. Rollback Plan¶

If something goes wrong during migration: 1. DNS records can be flipped back to VPS #1 (dev/staging containers are still running until Phase 5) 2. VPS #1 retains all data until Phase 5 cleanup 3. Phase 5 is the point of no return — don't execute it until VPS #2 has been stable for at least a week

7. What This Does NOT Change¶

Local development — still the same: both apps on your Mac, localhost
Git workflow — same branches, same repos, same CI
Deploy scripts — same deploy.sh per app, just run on the right server
Helios ↔ Aletheia communication — stays on Docker network, no cross-server calls
SOPS/age — same key, same workflow
Prod domains — no DNS changes for production

8. Future Considerations¶

If Helios needs its own server later¶

This split makes a future Helios separation easier: you'd move helios-prod-web from VPS #1 to a VPS #3, and connect it back to Aletheia via vRack. But with 9 dental practice sites, this is unlikely to be needed soon.

If you want managed Postgres¶

Moving prod Postgres off VPS #1 to a managed service (e.g., OVH Cloud Databases) gives you automatic backups, point-in-time recovery, and failover. But it adds network latency to every query and costs significantly more. Only consider this if database reliability becomes a concern.

Monitoring consolidation¶

Once VPS #2 is stable, consider whether you want centralized monitoring (Prometheus on VPS #1 scraping both) or independent stacks. Centralized is simpler but means a VPS #1 outage blinds you to VPS #2 issues too.