1.1 KiB
1.1 KiB
Observability 003: Preprod Operations Loop
Goal
Close the preproduction operations loop by adding alert delivery scaffolding, uptime probes, workflow health gauges, secured Grafana guidance, and an operator runbook.
Feature Spec
docs/FEATURES/observability.md
Scope
- Add Alertmanager to the optional observability compose overlay.
- Add Blackbox Exporter uptime probes for the web container and API readiness endpoint.
- Add backend database-derived workflow health gauges.
- Add Prometheus alerts for uptime probes and workflow health.
- Add an optional Caddy snippet for protected Grafana exposure.
- Add an operator runbook for bring-up, alert triage, and security defaults.
Out Of Scope
- Operating the remote preproduction host.
- Choosing the final alert destination.
- Client-facing status page.
- External third-party uptime monitoring.
Validation
dotnet build backend/Socialize.slnx
dotnet test backend/Socialize.slnx
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
jq empty deploy/observability/grafana/dashboards/socialize-overview.json