Skip to Content
InfraOperations (Net/Obs/Cost)

Operations (Network, Observability & Cost)

Secrets & Network

  • Secrets: AWS Secrets Manager/SSM; inject via ECS Task Definition (not baked into image).
  • Network: task SG allows inbound only from ALB SG on app port; egress limited to DB/S3/etc.
  • OAuth: central callback origin (e.g., auth.qa.domain); pass PR host in OAuth state, then redirect to web-pr-<iid>.qa.domain/auth/callback-handler.
  • Consider WAF on ALB (rate limiting, IP blocks).

Observability

  • CloudWatch Logs per family: /ecs/cazvid-web, /ecs/cazvid-api.
  • ALB access logs (optional) to S3.
  • Alarms
    • ALB 5xx spike
    • Target health < 100%
    • ECS task restart loop

Cost Controls

  • Fargate tiny tasks: 0.25 vCPU / 0.5 GB by default.
  • TTL cleanup: auto-destroy previews after N hours.
  • Manual Sleep/Wake pipeline buttons (desiredCount 0/1).
  • (Advanced) Scale-to-zero on idle via Application Auto Scaling + ALB metrics.

Rollbacks & Runbook

Rollback

  • ECS keeps TaskDef revisions → ecs update-service --task-definition <prev>.
  • If preview deploy fails: delete ALB rule/TG and re-run job.

Common Issues

  • Region drift: keep ECR/ECS/ALB in the same region.
  • Docker-in-Docker: DOCKER_HOST=tcp://docker:2375, DOCKER_TLS_CERTDIR="".
  • ALB timeouts: set idle timeout 120s for SSR streaming.
  • Health checks: confirm route paths and container port mapping.
Last updated on