Log Aggregation¶
Pipeline¶
Alloy connects to the Docker socket, discovers all running containers, and ships their stdout/stderr logs to Loki. No per-container configuration needed — new containers are picked up automatically.
How It Works¶
Alloy (Collector)¶
- Config:
monitoring/alloy/config.alloy - Discovers containers via
/var/run/docker.sock - Extracts labels: Docker Compose project name, service name
- Forwards to Loki at
http://loki:3100/loki/api/v1/push - Internal state stored at
/var/lib/alloy/data/
Loki (Storage & Query)¶
- Config:
monitoring/loki/loki-config.yml - Storage: filesystem-based (chunks in
/loki/chunks/) - Retention: 7 days (compactor deletes after 2h grace period)
- Schema: v13 (TSDB backend), 24-hour index periods
- Single node (no clustering)
Querying Logs¶
In Grafana → Explore → select Loki datasource.
Common LogQL queries¶
# All logs from a specific container
{compose_service="aletheia-prod-web"}
# Error-level logs across all Aletheia containers
{compose_project="aletheia-prod"} |= "ERROR"
# Celery task failures
{compose_service=~"aletheia-.*-celery"} |= "Task" |= "raised"
# Nginx access logs with 5xx status
{compose_service="nginx-proxy"} |~ "HTTP/[12].\" 5[0-9]{2}"
# Slow requests (> 1 second)
{compose_service="aletheia-prod-web"} | json | response_time > 1000
Useful filters¶
| Operator | Purpose | Example |
|---|---|---|
\|= |
Contains string | \|= "ERROR" |
\|~ |
Matches regex | \|~ "5[0-9]{2}" |
!= |
Does not contain | != "healthcheck" |
\| json |
Parse JSON logs | \| json \| level="error" |
Retention & Storage¶
- Logs are kept for 7 days based on their timestamp (not when they were ingested)
- After 7 days, the Loki compactor permanently deletes expired chunks (runs every 2 hours)
- Storage is filesystem-based in
/loki/chunks/— check disk usage if the volume grows unexpectedly - No log parsing/structuring is applied by Alloy (logs are shipped as-is)
- No remote write — logs are lost if the Loki container is destroyed
Logs older than 7 days are unrecoverable
There is no long-term archival. If you need logs from more than 7 days ago, they are gone. For investigations that may span longer periods, export relevant logs before they expire (see below).
Exporting logs before expiry¶
To save logs for a longer period (e.g. for an incident investigation or audit):
# Export logs from a specific service to a file using LogCLI
# (or use Grafana Explore → Inspector → Download as CSV)
docker exec monitoring_grafana wget -qO- \
'http://loki:3100/loki/api/v1/query_range?query={compose_project="aletheia-prod"}&start=2025-01-01T00:00:00Z&end=2025-01-02T00:00:00Z&limit=5000' \
> /tmp/exported-logs.json
Or query directly from Grafana Explore and use the Inspector tab → Data → Download CSV.
Limitations¶
| Limitation | Impact | Workaround |
|---|---|---|
| 7-day retention | Can't investigate old incidents | Export logs before they expire |
| No off-site replication | Loki data lost if disk fails | Rely on daily backups for database state; logs are ephemeral |
| Single node | No HA — Loki downtime = log gap | Monitor Loki container health (see runbooks) |
| No structured parsing | Raw log lines only | Use | json or |~ regex in LogQL queries |