Aller au contenu

Dashboards

Five pre-provisioned dashboards in Grafana, loaded from JSON files and marked read-only.

Docker Containers

File: dashboards/json/docker.json

Panels:

  • Container CPU Usage (%)
  • Container Memory Usage & Memory Limit
  • Container Swap Usage
  • Container Network I/O (bytes in/out)

When to use: Quick overview of container resource usage. First place to check when an app is slow or an OOM kill is suspected.

Node / System

File: dashboards/json/node-exporter.json (463 KB — the most comprehensive)

Sections:

  • CPU: Cores, frequency, busy %, scheduling, context switches, IRQ, load average
  • Memory: Full breakdown (anonymous, DirectMap, HugePages, LRU, slab, swap, writeback)
  • Disk: Throughput, IOPS, wait time, I/O utilization
  • Network: Netstat, saturation, sockstat, traffic errors
  • System: Processes, threads, uptime, file descriptors, timesync

When to use: Deep system-level investigation. Disk I/O bottlenecks, memory pressure, network saturation.

PostgreSQL

File: dashboards/json/postgresql.json

Panels:

  • Database stats: transactions/s, fetch/insert/update/delete rates
  • Connection pooling: current connections, max connections, max parallel workers
  • Query performance: cache hit rate, random/seq page cost
  • Maintenance: shared buffers, work_mem, checkpoint stats
  • WAL/Durability: max WAL size, conflicts, deadlocks
  • Lock tables and vacuum info

When to use: Slow queries, connection exhaustion, cache hit rate drops, deadlock investigation.

Redis

File: dashboards/json/redis.json

Panels:

  • Connected clients and operations/s
  • Hit/miss rates, key expiration
  • Memory usage by eviction policy
  • Command statistics

When to use: Cache performance, memory growth, eviction rate monitoring.

Celery

File: dashboards/json/celery.json

Panels:

  • Workers: count, uptime
  • Queue length: current and over time
  • Active tasks and active tasks per worker
  • Task runtime: p50 / p95 / p99
  • Task throughput by task type
  • Failed/retried tasks (last 1 hour)
  • Tasks completed (stacked by type)

When to use: Task backlogs, slow tasks, worker health, failure spikes.

Adding a New Dashboard

  1. Create the dashboard in Grafana UI (use provisioned datasources)
  2. Export as JSON (Share → Export → Save to file)
  3. Save to monitoring/grafana/provisioning/dashboards/json/<name>.json in the Aether repo
  4. Run make deploy to copy to server
  5. Grafana auto-reloads dashboards every 60 seconds