35 lines
1.1 KiB
Markdown
35 lines
1.1 KiB
Markdown
# Observability 003: Preprod Operations Loop
|
|
|
|
## Goal
|
|
|
|
Close the preproduction operations loop by adding alert delivery scaffolding, uptime probes, workflow health gauges, secured Grafana guidance, and an operator runbook.
|
|
|
|
## Feature Spec
|
|
|
|
- `docs/FEATURES/observability.md`
|
|
|
|
## Scope
|
|
|
|
- Add Alertmanager to the optional observability compose overlay.
|
|
- Add Blackbox Exporter uptime probes for the web container and API readiness endpoint.
|
|
- Add backend database-derived workflow health gauges.
|
|
- Add Prometheus alerts for uptime probes and workflow health.
|
|
- Add an optional Caddy snippet for protected Grafana exposure.
|
|
- Add an operator runbook for bring-up, alert triage, and security defaults.
|
|
|
|
## Out Of Scope
|
|
|
|
- Operating the remote preproduction host.
|
|
- Choosing the final alert destination.
|
|
- Client-facing status page.
|
|
- External third-party uptime monitoring.
|
|
|
|
## Validation
|
|
|
|
```bash
|
|
dotnet build backend/Socialize.slnx
|
|
dotnet test backend/Socialize.slnx
|
|
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
|
|
jq empty deploy/observability/grafana/dashboards/socialize-overview.json
|
|
```
|