Aller au contenu

Monitoring Overview

Stack

The monitoring stack runs as Docker containers in the monitoring network on VPS #1.

Component Version Role Port
Prometheus v3.10 Metrics aggregation, alerting rules 9090
Grafana v12.4 Dashboards, unified alerting 3000 (proxied via nginx)
Loki v3.6 Log aggregation 3100
Alloy v1.14 Log collector (Docker auto-discovery)
node-exporter v1.10 Host system metrics (CPU, memory, disk, network) 9100
cAdvisor v0.55 Container metrics (CPU, memory, I/O) 8080
postgres-exporter v0.19 PostgreSQL metrics 9187
redis-exporter v1.82 Redis metrics 9121
nginx-exporter v1.4 Nginx reverse proxy metrics 9113
celery-exporter v0.10 Celery task queue metrics 9808
blackbox-exporter v0.26 External probes (SSL, HTTP health) 9115

Architecture

                              ┌──────────────┐
                              │   Grafana    │ ← dashboards + alerts
                              │  :3000       │
                              └──┬───────┬───┘
                                 │       │
                    ┌────────────┘       └────────────┐
                    ▼                                  ▼
             ┌──────────────┐                  ┌──────────────┐
             │  Prometheus  │                  │     Loki     │
             │  :9090       │                  │  :3100       │
             └──────┬───────┘                  └──────┬───────┘
                    │                                  │
        ┌───────────┼───────────┐                      │
        ▼           ▼           ▼                      ▼
   exporters   blackbox    cAdvisor              ┌──────────────┐
   (node,      (SSL +      (container            │    Alloy     │
    pg, redis,  HTTP        metrics)              │  (log shim)  │
    nginx,      probes)                           └──────────────┘
    celery)                                             │
                                                   Docker socket
                                                   (auto-discover)

Access

Configuration

All configs live in the Aether repo at monitoring/ and deploy to /opt/docker/monitoring/:

monitoring/
├── docker-compose.yml
├── prometheus/
│   ├── prometheus.yml          # scrape targets
│   └── alerts/                 # 6 rule files
├── loki/loki-config.yml
├── alloy/config.alloy
├── blackbox/blackbox.yml
└── grafana/provisioning/
    ├── datasources/
    ├── dashboards/json/        # 5 pre-built dashboards
    └── alerting/               # contact points, policies, rules

Resource Limits

Container Image Memory Limit
monitoring_prometheus prom/prometheus:v3.10.0 384m
monitoring_grafana grafana/grafana:12.4 384m
monitoring_node_exporter prom/node-exporter:v1.10.2 64m
monitoring_cadvisor gcr.io/cadvisor/cadvisor:v0.55.1 128m
monitoring_postgres_exporter prometheuscommunity/postgres-exporter:v0.19.1 48m
monitoring_redis_exporter oliver006/redis_exporter:v1.82.0 48m
monitoring_blackbox_exporter prom/blackbox-exporter:v0.26.0 64m
monitoring_loki grafana/loki:3.6 384m
monitoring_alloy grafana/alloy:v1.14.1 384m
monitoring_nginx_exporter nginx/nginx-prometheus-exporter:1.4 32m
monitoring_celery_exporter danihodovic/celery-exporter:0.10.10 128m

Retention

Data Retention Source
Prometheus metrics 15d monitoring/docker-compose.yml
Loki logs 168h (7 days) monitoring/loki/loki-config.yml
Backup daily 7 days backups/scripts/backup.sh
Backup weekly 28 days backups/scripts/backup.sh
Backup archive (RPPS) Indefinite backups/scripts/backup.sh