33 lines
1.2 KiB
Markdown
33 lines
1.2 KiB
Markdown
# Observability 002: Alerts And Dashboard Hardening
|
|
|
|
## Goal
|
|
|
|
Make the preproduction observability stack actionable by adding alert rules, better operator dashboards, pinned image versions, and operational counters for services that commonly fail silently.
|
|
|
|
## Feature Spec
|
|
|
|
- `docs/FEATURES/observability.md`
|
|
|
|
## Scope
|
|
|
|
- Pin Grafana, Prometheus, Loki, Tempo, and Alloy image tags in the observability compose overlay.
|
|
- Add Prometheus alert rules for API health, error rate, latency, usage silence, feedback bugs, email failures, blob failures, and background job failures.
|
|
- Expand the Grafana dashboard with health, usage, operational failure, alert, log, and trace-oriented panels.
|
|
- Add backend counters for email delivery, blob storage operations, and background job runs.
|
|
- Document alerting and safe Grafana exposure expectations.
|
|
|
|
## Out Of Scope
|
|
|
|
- Notification delivery integration for alerts.
|
|
- Client-facing status page.
|
|
- Cloud observability backends.
|
|
- Full product analytics or session tracking.
|
|
|
|
## Validation
|
|
|
|
```bash
|
|
dotnet build backend/Socialize.slnx
|
|
dotnet test backend/Socialize.slnx
|
|
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
|
|
```
|