Operations (Network, Observability & Cost)
Secrets & Network
- Secrets: AWS Secrets Manager/SSM; inject via ECS Task Definition (not baked into image).
- Network: task SG allows inbound only from ALB SG on app port; egress limited to DB/S3/etc.
- OAuth: central callback origin (e.g.,
auth.qa.domain); pass PR host in OAuthstate, then redirect toweb-pr-<iid>.qa.domain/auth/callback-handler. - Consider WAF on ALB (rate limiting, IP blocks).
Observability
- CloudWatch Logs per family:
/ecs/cazvid-web,/ecs/cazvid-api. - ALB access logs (optional) to S3.
- Alarms
- ALB 5xx spike
- Target health < 100%
- ECS task restart loop
Cost Controls
- Fargate tiny tasks: 0.25 vCPU / 0.5 GB by default.
- TTL cleanup: auto-destroy previews after N hours.
- Manual Sleep/Wake pipeline buttons (desiredCount 0/1).
- (Advanced) Scale-to-zero on idle via Application Auto Scaling + ALB metrics.
Rollbacks & Runbook
Rollback
- ECS keeps TaskDef revisions →
ecs update-service --task-definition <prev>. - If preview deploy fails: delete ALB rule/TG and re-run job.
Common Issues
- Region drift: keep ECR/ECS/ALB in the same region.
- Docker-in-Docker:
DOCKER_HOST=tcp://docker:2375,DOCKER_TLS_CERTDIR="". - ALB timeouts: set idle timeout 120s for SSR streaming.
- Health checks: confirm route paths and container port mapping.
Last updated on