Monitoring Overview
Stack
The monitoring stack runs as Docker containers in the monitoring network on VPS #1.
| Component |
Version |
Role |
Port |
| Prometheus |
v3.10 |
Metrics aggregation, alerting rules |
9090 |
| Grafana |
v12.4 |
Dashboards, unified alerting |
3000 (proxied via nginx) |
| Loki |
v3.6 |
Log aggregation |
3100 |
| Alloy |
v1.14 |
Log collector (Docker auto-discovery) |
— |
| node-exporter |
v1.10 |
Host system metrics (CPU, memory, disk, network) |
9100 |
| cAdvisor |
v0.55 |
Container metrics (CPU, memory, I/O) |
8080 |
| postgres-exporter |
v0.19 |
PostgreSQL metrics |
9187 |
| redis-exporter |
v1.82 |
Redis metrics |
9121 |
| nginx-exporter |
v1.4 |
Nginx reverse proxy metrics |
9113 |
| celery-exporter |
v0.10 |
Celery task queue metrics |
9808 |
| blackbox-exporter |
v0.26 |
External probes (SSL, HTTP health) |
9115 |
Architecture
┌──────────────┐
│ Grafana │ ← dashboards + alerts
│ :3000 │
└──┬───────┬───┘
│ │
┌────────────┘ └────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Prometheus │ │ Loki │
│ :9090 │ │ :3100 │
└──────┬───────┘ └──────┬───────┘
│ │
┌───────────┼───────────┐ │
▼ ▼ ▼ ▼
exporters blackbox cAdvisor ┌──────────────┐
(node, (SSL + (container │ Alloy │
pg, redis, HTTP metrics) │ (log shim) │
nginx, probes) └──────────────┘
celery) │
Docker socket
(auto-discover)
Access
Configuration
All configs live in the Aether repo at monitoring/ and deploy to /opt/docker/monitoring/:
monitoring/
├── docker-compose.yml
├── prometheus/
│ ├── prometheus.yml # scrape targets
│ └── alerts/ # 6 rule files
├── loki/loki-config.yml
├── alloy/config.alloy
├── blackbox/blackbox.yml
└── grafana/provisioning/
├── datasources/
├── dashboards/json/ # 5 pre-built dashboards
└── alerting/ # contact points, policies, rules
Resource Limits
| Container |
Image |
Memory Limit |
monitoring_prometheus |
prom/prometheus:v3.10.0 |
384m |
monitoring_grafana |
grafana/grafana:12.4 |
384m |
monitoring_node_exporter |
prom/node-exporter:v1.10.2 |
64m |
monitoring_cadvisor |
gcr.io/cadvisor/cadvisor:v0.55.1 |
128m |
monitoring_postgres_exporter |
prometheuscommunity/postgres-exporter:v0.19.1 |
48m |
monitoring_redis_exporter |
oliver006/redis_exporter:v1.82.0 |
48m |
monitoring_blackbox_exporter |
prom/blackbox-exporter:v0.26.0 |
64m |
monitoring_loki |
grafana/loki:3.6 |
384m |
monitoring_alloy |
grafana/alloy:v1.14.1 |
384m |
monitoring_nginx_exporter |
nginx/nginx-prometheus-exporter:1.4 |
32m |
monitoring_celery_exporter |
danihodovic/celery-exporter:0.10.10 |
128m |
Retention
| Data |
Retention |
Source |
| Prometheus metrics |
15d |
monitoring/docker-compose.yml |
| Loki logs |
168h (7 days) |
monitoring/loki/loki-config.yml |
| Backup daily |
7 days |
backups/scripts/backup.sh |
| Backup weekly |
28 days |
backups/scripts/backup.sh |
| Backup archive (RPPS) |
Indefinite |
backups/scripts/backup.sh |