Server Separation Plan — Prod vs Dev/Staging¶
Date: 2026-04-10 Status: Draft Decision: Separate production from dev/staging onto two servers
1. Why Separate¶
Current state: 1 server, 30 containers¶
| Resource | Value | Concern |
|---|---|---|
| CPU | 6 vCPUs | Shared across all envs |
| RAM | 12 GB (6.5 GB used, 1.3 GB swap) | 55% used at rest — spikes during celery-heavy tasks |
| Disk | 100 GB (73% used) | Media + backups growing |
| Containers | 30 | All 3 envs + shared services + monitoring |
Risks of the current single-server setup¶
- Blast radius — a runaway dev/staging celery task can trigger OOM kills that affect prod. OOM scores mitigate this but don't eliminate it: it's still one kernel making triage decisions under pressure.
- Shared database — a bad migration on staging can lock tables prod is reading. A
DROP COLUMNon dev with a long lock wait blocks prod queries on that table. - Resource contention — staging celery-heavy currently uses 754 MB. During RPPS imports, it can hit its 3 GB limit. That's memory prod can't use.
- Deployment risk — a misconfigured nginx reload, a wrong docker compose project name, or a network conflict during dev deployment can take prod down.
- Maintenance windows — upgrading Postgres, Redis, or Docker itself requires downtime for all environments simultaneously.
What triggers the move¶
Any one of these justifies it: - A staging incident has affected (or nearly affected) prod - RAM usage regularly exceeds 80% (currently at ~55%, but celery-heavy spikes can push it higher) - You want to do risky operations (Postgres major upgrade, kernel update) without prod risk - Disk is approaching capacity (currently 73%)
2. Target Architecture¶
VPS #1 — Production (existing server: 54.36.99.184)¶
VPS #1 — PROD (54.36.99.184)
├── nginx (ports 80/443)
│ ├── aletheia.groupe-suffren.com → aletheia-prod-web:8000
│ ├── cabinet-dentaire-aubagne.fr → helios-prod-web:3000
│ ├── le-canet.chirurgiens-dentistes.fr → helios-prod-web:3000
│ ├── (other practice domains) → helios-prod-web:3000
│ ├── cda.groupe-suffren.com → helios-prod-web:3000
│ ├── monitoring.groupe-suffren.com → grafana:3000
│ └── analytics.groupe-suffren.com → umami:3000
│
├── Aletheia prod
│ ├── aletheia-prod-web (Gunicorn)
│ ├── aletheia-prod-celery
│ ├── aletheia-prod-celery-heavy
│ └── aletheia-prod-beat
│
├── Helios prod
│ └── helios-prod-web (Next.js)
│
├── Shared services (prod only)
│ ├── PostgreSQL 18 (aletheia_prod, umami)
│ └── Redis 7 (DB 0-1 only)
│
├── Monitoring (Prometheus, Grafana, Loki, Alloy, exporters)
├── Umami analytics
└── Backups (prod DB + media only)
Containers: ~17 (down from 30)
VPS #2 — Dev/Staging (new server)¶
VPS #2 — DEV/STAGING (new OVH VPS)
├── nginx (ports 80/443)
│ ├── aletheia-staging.groupe-suffren.com → aletheia-staging-web:8000
│ ├── aletheia-dev.groupe-suffren.com → aletheia-dev-web:8000
│ ├── cda-staging.groupe-suffren.com → helios-staging-web:3000
│ ├── cda-dev.groupe-suffren.com → helios-dev-web:3000
│ ├── vsm-staging.groupe-suffren.com → helios-staging-web:3000
│ ├── vsm-dev.groupe-suffren.com → helios-dev-web:3000
│ └── (other practice dev/staging domains)
│
├── Aletheia staging
│ ├── aletheia-staging-web
│ ├── aletheia-staging-celery
│ ├── aletheia-staging-celery-heavy
│ └── aletheia-staging-beat
│
├── Aletheia dev
│ ├── aletheia-dev-web
│ ├── aletheia-dev-celery
│ ├── aletheia-dev-celery-heavy
│ └── aletheia-dev-beat
│
├── Helios staging + dev
│ ├── helios-staging-web
│ └── helios-dev-web
│
├── Shared services (dev/staging only)
│ ├── PostgreSQL 18 (aletheia_staging, aletheia_dev)
│ └── Redis 7 (DB 2-5)
│
└── Lightweight monitoring (node_exporter only, scraped by prod Prometheus)
Containers: ~17
3. What Changes¶
3.1 DNS¶
Move dev/staging DNS records to point to VPS #2:
| Record | Current | After |
|---|---|---|
aletheia-staging.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
aletheia-dev.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
cda-staging.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
cda-dev.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
vsm-staging.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
vsm-dev.groupe-suffren.com |
54.36.99.184 | VPS #2 IP |
(all other *-staging / *-dev subdomains) |
54.36.99.184 | VPS #2 IP |
Prod DNS records stay unchanged.
3.2 Database Separation¶
Each server gets its own PostgreSQL instance. This is the key benefit — no cross-environment table locking.
VPS #1 Postgres (prod):
- aletheia_prod database + aletheia_prod user
- umami database + umami user
VPS #2 Postgres (dev/staging):
- aletheia_staging database + aletheia_staging user
- aletheia_dev database + aletheia_dev user
The init SQL script splits into two files or is parameterized per server.
3.3 Redis Separation¶
VPS #1 Redis: DB 0-1 (prod broker + cache) VPS #2 Redis: DB 2-5 (staging + dev broker + cache), or renumber to 0-3 on VPS #2
3.4 Nginx¶
Each server runs its own nginx instance with only its environment's vhosts.
VPS #1 nginx configs:
- aletheia-prod.conf.full / .temp
- helios-prod.conf.full / .temp
- monitoring.conf.full / .temp
- analytics.conf.full / .temp
VPS #2 nginx configs:
- aletheia-staging.conf.full / .temp
- aletheia-dev.conf.full / .temp
- helios-staging.conf.full / .temp
- helios-dev.conf.full / .temp
Remove the .htpasswd basic auth requirement from VPS #1 (prod doesn't use it). VPS #2 keeps it for staging/dev.
3.5 SSL Certificates¶
Each server obtains its own certificates via certbot for its domains. Prod certs stay on VPS #1. Staging/dev certs on VPS #2.
If using Cloudflare for prod domains, that's unaffected. Dev/staging subdomains under groupe-suffren.com can use Let's Encrypt on VPS #2.
3.6 Monitoring¶
Option A — Single Prometheus on VPS #1 (recommended to start):
- VPS #1 Prometheus scrapes local exporters as before
- VPS #1 Prometheus also scrapes VPS #2's node_exporter over the network (port 9100, firewalled to VPS #1 IP only)
- Grafana dashboards cover both servers
- Loki on VPS #1, Alloy on both servers pushing logs to VPS #1 Loki
Option B — Full monitoring on both (later if needed): - Duplicate the monitoring stack on VPS #2 - Use Grafana on VPS #1 with Prometheus/Loki on VPS #2 as remote data sources
3.7 Backups¶
VPS #1: Backs up aletheia_prod + umami + prod media. Same cron, same retention.
VPS #2: Backs up aletheia_staging only (dev is disposable). Lighter retention (3-day daily, no weekly). Or no backups at all — staging data can be re-seeded from a prod dump.
3.8 Aether Repo Structure¶
The Makefile and config files need to become server-aware. Two approaches:
Approach A — Server variable in Makefile (simpler):
make deploy SERVER=prod # deploys prod configs to VPS #1
make deploy SERVER=devstg # deploys dev/staging configs to VPS #2
The INFRA_FILES list splits into PROD_FILES and DEVSTG_FILES. Diff/deploy/pull targets accept a SERVER parameter.
Approach B — Two branches or directories (cleaner long-term):
aether/repo/
├── prod/ # configs for VPS #1
│ ├── nginx/conf.d/ # only prod vhosts
│ ├── shared/ # prod postgres/redis compose
│ └── monitoring/ # full monitoring stack
├── devstg/ # configs for VPS #2
│ ├── nginx/conf.d/ # only dev/staging vhosts
│ ├── shared/ # dev/staging postgres/redis compose
│ └── monitoring/ # lightweight (node_exporter + alloy)
├── common/ # shared configs (nginx.conf base, security/)
├── Makefile
└── setup.sh # accepts --server=prod|devstg
3.9 Secret Management¶
SOPS encryption stays the same — same age key on both servers. The SECRET_FILES list splits:
VPS #1 secrets: shared/.env (prod), envs/aletheia/.env.prod, envs/helios/.env.prod, monitoring/.env, umami/.env, nginx/.htpasswd
VPS #2 secrets: shared/.env (devstg), envs/aletheia/.env.staging, envs/aletheia/.env.dev, envs/helios/.env.staging, envs/helios/.env.dev, nginx/.htpasswd
3.10 Deploy Scripts¶
deploy.sh in aletheia and helios repos need a server mapping:
- deploy.sh prod → runs on VPS #1
- deploy.sh staging / deploy.sh dev → runs on VPS #2
Since deploy scripts run locally on the server (SSH in, then run), this is mostly about knowing which server to SSH into. No code change needed in the scripts themselves — they already parameterize by environment.
3.11 Helios → Aletheia API¶
Currently Helios calls Aletheia over the Docker backend network: http://aletheia-{env}-web:8000.
After separation, nothing changes — each Helios environment is on the same server as its matching Aletheia environment. The Docker network names and service names stay the same.
This is a major advantage of prod/devstg split over the Helios-on-its-own-server option: no cross-server API calls, no latency penalty, no vRack needed.
3.12 ISR Webhook (Aletheia → Helios)¶
Same — stays on the Docker web network within each server. http://helios-{env}-web:3000/api/revalidate works unchanged.
4. VPS #2 Sizing¶
Dev/staging is lower traffic than prod. A smaller VPS suffices:
| Resource | Recommendation | Rationale |
|---|---|---|
| CPU | 4 vCPUs | Enough for 2 envs of Aletheia + Helios |
| RAM | 8 GB | Dev + staging Aletheia + celery-heavy (3G) + headroom |
| Disk | 40-60 GB | No prod media, no long-term backups |
| Location | OVH France | Same region as VPS #1, GDPR compliant |
Estimated cost: ~10-15 EUR/month for an OVH VPS at this spec.
5. Migration Procedure¶
Phase 1 — Provision VPS #2¶
- Order OVH VPS (Debian 12+, France region)
- Run the setup script (adapted for dev/staging): This sets up: Docker, security hardening, directory structure, Docker networks, Postgres (with staging + dev databases), Redis, nginx (with dev/staging vhosts), node_exporter
- Place the age key at
/opt/docker/.age-key.txt - Decrypt dev/staging secrets:
make decrypt SERVER=devstg
Phase 2 — Seed Data¶
- Dump staging database from VPS #1:
- Transfer and restore on VPS #2:
- Repeat for dev database (or start fresh)
- Copy staging media if needed:
Phase 3 — Deploy Apps on VPS #2¶
- Clone aletheia and helios repos to VPS #2
- Deploy staging and dev:
- Verify apps are running and accessible via VPS #2 IP
Phase 4 — DNS Cutover¶
- Update staging/dev DNS records to VPS #2 IP
- Wait for propagation (~5 min with low TTL, set TTL low beforehand)
- Obtain SSL certificates on VPS #2:
- Activate full nginx configs (HTTPS) on VPS #2
Phase 5 — Clean Up VPS #1¶
- Stop dev/staging containers on VPS #1:
- Remove dev/staging nginx vhosts from VPS #1
- Remove dev/staging databases from VPS #1 Postgres:
- Remove dev/staging env files and media from VPS #1
- Reload nginx on VPS #1
- Update VPS #1 monitoring to remove dev/staging targets
Phase 6 — Adjust Resource Limits on VPS #1¶
With only prod running, prod containers can have more generous limits:
| Container | Before | After |
|---|---|---|
| aletheia-prod-web | 2 GB | 3 GB |
| aletheia-prod-celery | 2 GB | 2 GB (unchanged) |
| aletheia-prod-celery-heavy | 3 GB | 4 GB |
| helios-prod-web | 1 GB | 1.5 GB |
OOM scores become less critical since there's no dev/staging to compete, but keep them for safety.
6. Rollback Plan¶
If something goes wrong during migration: 1. DNS records can be flipped back to VPS #1 (dev/staging containers are still running until Phase 5) 2. VPS #1 retains all data until Phase 5 cleanup 3. Phase 5 is the point of no return — don't execute it until VPS #2 has been stable for at least a week
7. What This Does NOT Change¶
- Local development — still the same: both apps on your Mac, localhost
- Git workflow — same branches, same repos, same CI
- Deploy scripts — same
deploy.shper app, just run on the right server - Helios ↔ Aletheia communication — stays on Docker network, no cross-server calls
- SOPS/age — same key, same workflow
- Prod domains — no DNS changes for production
8. Future Considerations¶
If Helios needs its own server later¶
This split makes a future Helios separation easier: you'd move helios-prod-web from VPS #1 to a VPS #3, and connect it back to Aletheia via vRack. But with 9 dental practice sites, this is unlikely to be needed soon.
If you want managed Postgres¶
Moving prod Postgres off VPS #1 to a managed service (e.g., OVH Cloud Databases) gives you automatic backups, point-in-time recovery, and failover. But it adds network latency to every query and costs significantly more. Only consider this if database reliability becomes a concern.
Monitoring consolidation¶
Once VPS #2 is stable, consider whether you want centralized monitoring (Prometheus on VPS #1 scraping both) or independent stacks. Centralized is simpler but means a VPS #1 outage blinds you to VPS #2 issues too.