# Observability 002: Alerts And Dashboard Hardening ## Goal Make the preproduction observability stack actionable by adding alert rules, better operator dashboards, pinned image versions, and operational counters for services that commonly fail silently. ## Feature Spec - `docs/FEATURES/observability.md` ## Scope - Pin Grafana, Prometheus, Loki, Tempo, and Alloy image tags in the observability compose overlay. - Add Prometheus alert rules for API health, error rate, latency, usage silence, feedback bugs, email failures, blob failures, and background job failures. - Expand the Grafana dashboard with health, usage, operational failure, alert, log, and trace-oriented panels. - Add backend counters for email delivery, blob storage operations, and background job runs. - Document alerting and safe Grafana exposure expectations. ## Out Of Scope - Notification delivery integration for alerts. - Client-facing status page. - Cloud observability backends. - Full product analytics or session tracking. ## Validation ```bash dotnet build backend/Socialize.slnx dotnet test backend/Socialize.slnx docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config ```