Dashboards¶
Five pre-provisioned dashboards in Grafana, loaded from JSON files and marked read-only.
Docker Containers¶
File: dashboards/json/docker.json
Panels:
- Container CPU Usage (%)
- Container Memory Usage & Memory Limit
- Container Swap Usage
- Container Network I/O (bytes in/out)
When to use: Quick overview of container resource usage. First place to check when an app is slow or an OOM kill is suspected.
Node / System¶
File: dashboards/json/node-exporter.json (463 KB — the most comprehensive)
Sections:
- CPU: Cores, frequency, busy %, scheduling, context switches, IRQ, load average
- Memory: Full breakdown (anonymous, DirectMap, HugePages, LRU, slab, swap, writeback)
- Disk: Throughput, IOPS, wait time, I/O utilization
- Network: Netstat, saturation, sockstat, traffic errors
- System: Processes, threads, uptime, file descriptors, timesync
When to use: Deep system-level investigation. Disk I/O bottlenecks, memory pressure, network saturation.
PostgreSQL¶
File: dashboards/json/postgresql.json
Panels:
- Database stats: transactions/s, fetch/insert/update/delete rates
- Connection pooling: current connections, max connections, max parallel workers
- Query performance: cache hit rate, random/seq page cost
- Maintenance: shared buffers, work_mem, checkpoint stats
- WAL/Durability: max WAL size, conflicts, deadlocks
- Lock tables and vacuum info
When to use: Slow queries, connection exhaustion, cache hit rate drops, deadlock investigation.
Redis¶
File: dashboards/json/redis.json
Panels:
- Connected clients and operations/s
- Hit/miss rates, key expiration
- Memory usage by eviction policy
- Command statistics
When to use: Cache performance, memory growth, eviction rate monitoring.
Celery¶
File: dashboards/json/celery.json
Panels:
- Workers: count, uptime
- Queue length: current and over time
- Active tasks and active tasks per worker
- Task runtime: p50 / p95 / p99
- Task throughput by task type
- Failed/retried tasks (last 1 hour)
- Tasks completed (stacked by type)
When to use: Task backlogs, slow tasks, worker health, failure spikes.
Adding a New Dashboard¶
- Create the dashboard in Grafana UI (use provisioned datasources)
- Export as JSON (Share → Export → Save to file)
- Save to
monitoring/grafana/provisioning/dashboards/json/<name>.jsonin the Aether repo - Run
make deployto copy to server - Grafana auto-reloads dashboards every 60 seconds