Aller au contenu

Disaster Recovery

Overview

Full server rebuild procedure — from a blank Debian 12+ VPS to a running production environment with restored data.

Prerequisites

  • Access to the Aether repo (GitHub: baudry-suffren/aether)
  • The age private key (stored in the password manager — starts with AGE-SECRET-KEY-)
  • A recent database backup (from /opt/docker/backups/daily/ or off-site copy)
  • Media backups if applicable (prod/staging tar.gz archives)

Recovery Steps

1. Provision a new VPS

  • Debian 12+ (Bookworm or later)
  • Minimum: 4 vCPU, 8 GB RAM, 80 GB SSD
  • Add your SSH public key during provisioning

2. Initial server setup

# SSH in and clone Aether
git clone git@github.com:baudry-suffren/aether.git /opt/docker/aether/repo
cd /opt/docker/aether/repo

3. Run setup in recovery mode

The setup script has two modes — choose option 2 (Server migration / recovery). It will prompt for the age private key, then automatically decrypt all secrets.

# Interactive — prompts for mode selection and age key
sudo ./setup.sh

# Or non-interactive — pass the key directly
sudo ./setup.sh --recover AGE-SECRET-KEY-...

This handles everything: Docker install, security hardening, directory structure, secret decryption, Docker networks, shared services (PostgreSQL + Redis), nginx (in maintenance mode), monitoring stack, Umami, and the documentation site.

Where to find the age key

The age private key is stored in the team password manager. It starts with AGE-SECRET-KEY- and is a single line. The setup script validates the format before proceeding.

4. Verify secrets were restored

# Check that decrypted files exist
ls -la /opt/docker/shared/.env
ls -la /opt/docker/aletheia/envs/.env.prod
ls -la /opt/docker/monitoring/.env
ls -la /opt/docker/nginx/.htpasswd

# Spot-check a decrypted file (should contain real values, not placeholders)
head -2 /opt/docker/shared/.env

5. Verify shared services

PostgreSQL and Redis are started by setup.sh. Confirm they're healthy:

docker exec shared_postgres pg_isready -U admin
docker exec shared_redis redis-cli ping

6. Restore database

# Copy backup files to the container
docker cp backup_aletheia_prod_YYYYMMDD.dump shared_postgres:/tmp/
docker cp backup_aletheia_staging_YYYYMMDD.dump shared_postgres:/tmp/

# Restore each database (use pg_restore for .dump format)
docker exec shared_postgres pg_restore -U aletheia_prod -d aletheia_prod \
  -Fc --no-owner /tmp/backup_aletheia_prod_YYYYMMDD.dump

docker exec shared_postgres pg_restore -U aletheia_staging -d aletheia_staging \
  -Fc --no-owner /tmp/backup_aletheia_staging_YYYYMMDD.dump

7. Restore media files

tar xzf media_aletheia_prod_YYYYMMDD.tar.gz -C /opt/docker/aletheia/media/prod/
tar xzf media_aletheia_staging_YYYYMMDD.tar.gz -C /opt/docker/aletheia/media/staging/

8. Clone and deploy applications

# Aletheia
git clone git@github.com:baudry-suffren/aletheia_v2.git /opt/docker/aletheia/repo
cd /opt/docker/aletheia/repo && make deploy ENV=prod

# Helios
git clone git@github.com:baudry-suffren/helios.git /opt/docker/helios/repo
cd /opt/docker/helios/repo && make deploy ENV=prod

9. SSL certificates

setup.sh obtains certificates automatically. If some failed (e.g. DNS not yet pointing), retry manually:

docker exec certbot certbot certonly --webroot -w /var/www/certbot \
  --non-interactive --agree-tos --email admin@groupe-suffren.com \
  -d aletheia.groupe-suffren.com

# After obtaining certs, activate HTTPS configs and reload
cp /opt/docker/nginx/conf.d/aletheia-prod.conf.full /opt/docker/nginx/conf.d/aletheia-prod.conf
docker exec nginx-proxy nginx -s reload

10. Verify

  • [ ] DNS pointing to new server IP
  • [ ] All containers running: docker ps
  • [ ] Secrets decrypted: ls /opt/docker/shared/.env /opt/docker/aletheia/envs/.env.prod
  • [ ] Aletheia accessible: curl -sI https://aletheia.groupe-suffren.com/health/
  • [ ] Practice websites loading: check each Helios domain
  • [ ] Monitoring operational: https://monitoring.groupe-suffren.com
  • [ ] Backups scheduled: crontab -l
  • [ ] No config drift: cd /opt/docker/aether/repo && make diff

Testing DR

Use test-dr.sh in the Aether repo to validate the DR procedure in a non-destructive way:

cd /opt/docker/aether/repo && ./test-dr.sh

Recovery Time Objective (RTO)

Estimated recovery time: 1-2 hours from a blank VPS with all backups available.

Backup Locations

Data Location Frequency
Database (PostgreSQL) /opt/docker/backups/daily/ Daily
Encrypted secrets Git (.enc files in Aether repo) On change
Application code GitHub repos On push
Media files /opt/docker/aletheia/repo/media/ Not currently backed up off-site

Known Issues

  • Certbot timeouts: If DNS hasn't propagated yet, certbot will fail. Sites stay in HTTP-only maintenance mode until certs are obtained. Retry after DNS is confirmed with dig +short <domain>.
  • Helios SSL: Practice domains (e.g. cabinet-dentaire-aubagne.fr) need separate certbot runs. setup.sh only handles *.groupe-suffren.com domains automatically.

Gap: Off-site backups

Database backups currently stay on the same server. If the server is lost, backups are lost too. Until off-site backups are implemented, periodically copy /opt/docker/backups/daily/ to external storage.