feat: close preprod observability loop

This commit is contained in:
2026-05-08 15:48:56 -04:00
parent 8bcff96821
commit 986c7efea6
14 changed files with 618 additions and 2 deletions

View File

@@ -0,0 +1,34 @@
# Observability 003: Preprod Operations Loop
## Goal
Close the preproduction operations loop by adding alert delivery scaffolding, uptime probes, workflow health gauges, secured Grafana guidance, and an operator runbook.
## Feature Spec
- `docs/FEATURES/observability.md`
## Scope
- Add Alertmanager to the optional observability compose overlay.
- Add Blackbox Exporter uptime probes for the web container and API readiness endpoint.
- Add backend database-derived workflow health gauges.
- Add Prometheus alerts for uptime probes and workflow health.
- Add an optional Caddy snippet for protected Grafana exposure.
- Add an operator runbook for bring-up, alert triage, and security defaults.
## Out Of Scope
- Operating the remote preproduction host.
- Choosing the final alert destination.
- Client-facing status page.
- External third-party uptime monitoring.
## Validation
```bash
dotnet build backend/Socialize.slnx
dotnet test backend/Socialize.slnx
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
jq empty deploy/observability/grafana/dashboards/socialize-overview.json
```