feat: close preprod observability loop
This commit is contained in:
34
docs/TASKS/observability/003-preprod-operations-loop.md
Normal file
34
docs/TASKS/observability/003-preprod-operations-loop.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Observability 003: Preprod Operations Loop
|
||||
|
||||
## Goal
|
||||
|
||||
Close the preproduction operations loop by adding alert delivery scaffolding, uptime probes, workflow health gauges, secured Grafana guidance, and an operator runbook.
|
||||
|
||||
## Feature Spec
|
||||
|
||||
- `docs/FEATURES/observability.md`
|
||||
|
||||
## Scope
|
||||
|
||||
- Add Alertmanager to the optional observability compose overlay.
|
||||
- Add Blackbox Exporter uptime probes for the web container and API readiness endpoint.
|
||||
- Add backend database-derived workflow health gauges.
|
||||
- Add Prometheus alerts for uptime probes and workflow health.
|
||||
- Add an optional Caddy snippet for protected Grafana exposure.
|
||||
- Add an operator runbook for bring-up, alert triage, and security defaults.
|
||||
|
||||
## Out Of Scope
|
||||
|
||||
- Operating the remote preproduction host.
|
||||
- Choosing the final alert destination.
|
||||
- Client-facing status page.
|
||||
- External third-party uptime monitoring.
|
||||
|
||||
## Validation
|
||||
|
||||
```bash
|
||||
dotnet build backend/Socialize.slnx
|
||||
dotnet test backend/Socialize.slnx
|
||||
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
|
||||
jq empty deploy/observability/grafana/dashboards/socialize-overview.json
|
||||
```
|
||||
Reference in New Issue
Block a user