Aller au contenu

Log Aggregation

Pipeline

Docker containers → Alloy (auto-discovery) → Loki → Grafana (Explore)

Alloy connects to the Docker socket, discovers all running containers, and ships their stdout/stderr logs to Loki. No per-container configuration needed — new containers are picked up automatically.

How It Works

Alloy (Collector)

  • Config: monitoring/alloy/config.alloy
  • Discovers containers via /var/run/docker.sock
  • Extracts labels: Docker Compose project name, service name
  • Forwards to Loki at http://loki:3100/loki/api/v1/push
  • Internal state stored at /var/lib/alloy/data/

Loki (Storage & Query)

  • Config: monitoring/loki/loki-config.yml
  • Storage: filesystem-based (chunks in /loki/chunks/)
  • Retention: 7 days (compactor deletes after 2h grace period)
  • Schema: v13 (TSDB backend), 24-hour index periods
  • Single node (no clustering)

Querying Logs

In Grafana → Explore → select Loki datasource.

Common LogQL queries

# All logs from a specific container
{compose_service="aletheia-prod-web"}

# Error-level logs across all Aletheia containers
{compose_project="aletheia-prod"} |= "ERROR"

# Celery task failures
{compose_service=~"aletheia-.*-celery"} |= "Task" |= "raised"

# Nginx access logs with 5xx status
{compose_service="nginx-proxy"} |~ "HTTP/[12].\" 5[0-9]{2}"

# Slow requests (> 1 second)
{compose_service="aletheia-prod-web"} | json | response_time > 1000

Useful filters

Operator Purpose Example
\|= Contains string \|= "ERROR"
\|~ Matches regex \|~ "5[0-9]{2}"
!= Does not contain != "healthcheck"
\| json Parse JSON logs \| json \| level="error"

Retention & Storage

  • Logs are kept for 7 days based on their timestamp (not when they were ingested)
  • After 7 days, the Loki compactor permanently deletes expired chunks (runs every 2 hours)
  • Storage is filesystem-based in /loki/chunks/ — check disk usage if the volume grows unexpectedly
  • No log parsing/structuring is applied by Alloy (logs are shipped as-is)
  • No remote write — logs are lost if the Loki container is destroyed

Logs older than 7 days are unrecoverable

There is no long-term archival. If you need logs from more than 7 days ago, they are gone. For investigations that may span longer periods, export relevant logs before they expire (see below).

Exporting logs before expiry

To save logs for a longer period (e.g. for an incident investigation or audit):

# Export logs from a specific service to a file using LogCLI
# (or use Grafana Explore → Inspector → Download as CSV)
docker exec monitoring_grafana wget -qO- \
  'http://loki:3100/loki/api/v1/query_range?query={compose_project="aletheia-prod"}&start=2025-01-01T00:00:00Z&end=2025-01-02T00:00:00Z&limit=5000' \
  > /tmp/exported-logs.json

Or query directly from Grafana Explore and use the Inspector tab → DataDownload CSV.

Limitations

Limitation Impact Workaround
7-day retention Can't investigate old incidents Export logs before they expire
No off-site replication Loki data lost if disk fails Rely on daily backups for database state; logs are ephemeral
Single node No HA — Loki downtime = log gap Monitor Loki container health (see runbooks)
No structured parsing Raw log lines only Use | json or |~ regex in LogQL queries