Files
social-media/docs/TASKS/observability/002-alerts-dashboard-hardening.md

1.2 KiB

Observability 002: Alerts And Dashboard Hardening

Goal

Make the preproduction observability stack actionable by adding alert rules, better operator dashboards, pinned image versions, and operational counters for services that commonly fail silently.

Feature Spec

  • docs/FEATURES/observability.md

Scope

  • Pin Grafana, Prometheus, Loki, Tempo, and Alloy image tags in the observability compose overlay.
  • Add Prometheus alert rules for API health, error rate, latency, usage silence, feedback bugs, email failures, blob failures, and background job failures.
  • Expand the Grafana dashboard with health, usage, operational failure, alert, log, and trace-oriented panels.
  • Add backend counters for email delivery, blob storage operations, and background job runs.
  • Document alerting and safe Grafana exposure expectations.

Out Of Scope

  • Notification delivery integration for alerts.
  • Client-facing status page.
  • Cloud observability backends.
  • Full product analytics or session tracking.

Validation

dotnet build backend/Socialize.slnx
dotnet test backend/Socialize.slnx
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config