Log Aggregation¶

Pipeline¶

Docker containers → Alloy (auto-discovery) → Loki → Grafana (Explore)

Alloy connects to the Docker socket, discovers all running containers, and ships their stdout/stderr logs to Loki. No per-container configuration needed — new containers are picked up automatically.

How It Works¶

Alloy (Collector)¶

Config: monitoring/alloy/config.alloy
Discovers containers via /var/run/docker.sock
Extracts labels: Docker Compose project name, service name
Forwards to Loki at http://loki:3100/loki/api/v1/push
Internal state stored at /var/lib/alloy/data/

Loki (Storage & Query)¶

Config: monitoring/loki/loki-config.yml
Storage: filesystem-based (chunks in /loki/chunks/)
Retention: 7 days (compactor deletes after 2h grace period)
Schema: v13 (TSDB backend), 24-hour index periods
Single node (no clustering)

Querying Logs¶

In Grafana → Explore → select Loki datasource.

Common LogQL queries¶

# All logs from a specific container
{compose_service="aletheia-prod-web"}

# Error-level logs across all Aletheia containers
{compose_project="aletheia-prod"} |= "ERROR"

# Celery task failures
{compose_service=~"aletheia-.*-celery"} |= "Task" |= "raised"

# Nginx access logs with 5xx status
{compose_service="nginx-proxy"} |~ "HTTP/[12].\" 5[0-9]{2}"

# Slow requests (> 1 second)
{compose_service="aletheia-prod-web"} | json | response_time > 1000

Useful filters¶

Operator	Purpose	Example
`\\|=`	Contains string	`\\|= "ERROR"`
`\\|~`	Matches regex	`\\|~ "5[0-9]{2}"`
`!=`	Does not contain	`!= "healthcheck"`
`\\| json`	Parse JSON logs	`\\| json \\| level="error"`

Retention & Storage¶

Logs are kept for 7 days based on their timestamp (not when they were ingested)
After 7 days, the Loki compactor permanently deletes expired chunks (runs every 2 hours)
Storage is filesystem-based in /loki/chunks/ — check disk usage if the volume grows unexpectedly
No log parsing/structuring is applied by Alloy (logs are shipped as-is)
No remote write — logs are lost if the Loki container is destroyed

Logs older than 7 days are unrecoverable

There is no long-term archival. If you need logs from more than 7 days ago, they are gone. For investigations that may span longer periods, export relevant logs before they expire (see below).

Exporting logs before expiry¶

To save logs for a longer period (e.g. for an incident investigation or audit):

# Export logs from a specific service to a file using LogCLI
# (or use Grafana Explore → Inspector → Download as CSV)
docker exec monitoring_grafana wget -qO- \
  'http://loki:3100/loki/api/v1/query_range?query={compose_project="aletheia-prod"}&start=2025-01-01T00:00:00Z&end=2025-01-02T00:00:00Z&limit=5000' \
  > /tmp/exported-logs.json

Or query directly from Grafana Explore and use the Inspector tab → Data → Download CSV.

Limitations¶

Limitation	Impact	Workaround
7-day retention	Can't investigate old incidents	Export logs before they expire
No off-site replication	Loki data lost if disk fails	Rely on daily backups for database state; logs are ephemeral
Single node	No HA — Loki downtime = log gap	Monitor Loki container health (see runbooks)
No structured parsing	Raw log lines only	Use `\| json` or `\|~` regex in LogQL queries