Files
social-media/docs/TASKS/observability/003-preprod-operations-loop.md

1.1 KiB

Observability 003: Preprod Operations Loop

Goal

Close the preproduction operations loop by adding alert delivery scaffolding, uptime probes, workflow health gauges, secured Grafana guidance, and an operator runbook.

Feature Spec

  • docs/FEATURES/observability.md

Scope

  • Add Alertmanager to the optional observability compose overlay.
  • Add Blackbox Exporter uptime probes for the web container and API readiness endpoint.
  • Add backend database-derived workflow health gauges.
  • Add Prometheus alerts for uptime probes and workflow health.
  • Add an optional Caddy snippet for protected Grafana exposure.
  • Add an operator runbook for bring-up, alert triage, and security defaults.

Out Of Scope

  • Operating the remote preproduction host.
  • Choosing the final alert destination.
  • Client-facing status page.
  • External third-party uptime monitoring.

Validation

dotnet build backend/Socialize.slnx
dotnet test backend/Socialize.slnx
docker compose -f deploy/compose.yml -f deploy/observability/compose.observability.yml config
jq empty deploy/observability/grafana/dashboards/socialize-overview.json