Choorai
Cycle 6

Operations (Ops) Basics

Learn about deployment strategies, monitoring, incident response, and cost management to operate your service reliably.

What you'll learn in this Cycle

  • Deployment strategies (Blue-Green, Rolling)
  • Monitoring and alert setup
  • Incident response process
  • Cost management and optimization

Deployment Strategies

Methods for safely deploying new versions without downtime.

Blue-Green Deployment

Prepare two identical environments (Blue/Green) and switch traffic all at once.

Blue (current)
→ Switch →
Green (new version)

Pros: Fast rollback | Cons: Requires 2x resources

Rolling Update

Replaces instances sequentially. This is Cloud Run's default strategy.

v1 v1 v2 v1 v2 v2 v2 v2 v2

Pros: Resource efficient | Cons: Rollback is the same as a new deployment

Rollback Method

Terminal
# Rollback to a previous version on Cloud Run
gcloud run services update-traffic my-api \
  --to-revisions=my-api-00001-abc=100 \
  --region=asia-northeast3

# Cloudflare Pages rollback (from the dashboard)
# Deployments → Previous deployment → Rollback to this deploy

Monitoring Basics

Log Collection

Cloud Run automatically collects logs in Cloud Logging.

main.py
# Output structured logs in FastAPI
import logging
import json

logger = logging.getLogger(__name__)

@app.get("/api/users/{user_id}")
def get_user(user_id: int):
    logger.info(json.dumps({
        "action": "get_user",
        "user_id": user_id,
        "status": "success"
    }))
    return {"id": user_id}

Key Metrics

Response Time

Check p50, p95, p99

Target: p95 < 200ms

Error Rate

Percentage of 5xx errors

Target: < 0.1%

Request Count

Requests per minute/hour

Understand traffic patterns

Resource Usage

CPU, Memory

Scaling criteria

Alert Setup

Terminal (example)
# Create alert policy on GCP (CLI)
gcloud alpha monitoring policies create \
  --display-name="High Error Rate Alert" \
  --condition-display-name="Error rate > 1%" \
  --condition-filter='resource.type="cloud_run_revision" AND metric.type="run.googleapis.com/request_count"'

Recommended Alert Channels

  • Slack: Instant alerts to team channels
  • Email: Daily summary reports
  • PagerDuty: On-call escalation for critical incidents

Incident Response

Incident Response Process

1

Detect

Monitoring alerts or user reports

2

Classify

Determine severity (Critical/High/Medium/Low)

3

Mitigate

Rollback, scale up, block traffic, etc.

4

Resolve

Identify and fix the root cause

5

Retrospect

Write a postmortem, establish prevention measures

Rollback Decision Criteria

  • Error rate increases by more than 1%
  • p95 response time increases by more than 2x
  • Core functionality is down for more than 5 minutes
  • Estimated fix time is more than 15 minutes

Cost Management

Leveraging Free Tiers

Service Free Allowance
Cloudflare Pages Unlimited requests, 500 builds/month
Cloud Run 2M requests/month, 180,000 vCPU-seconds/month
Supabase 500MB DB, 1GB storage
Neon 0.5GB storage, 191 hours/month compute

Budget Alert Setup

Budget Alerts
# GCP budget alert setup (via console)
# 1. Billing → Budgets & alerts → Create budget
# 2. Set thresholds at $10, $50, $100/month etc.
# 3. Email alerts at 50%, 90%, 100% of budget

Watch out for unexpected costs

  • Data transfer (Egress) costs can be surprisingly high
  • Log storage costs are not negligible
  • Don't forget to shut down development instances

Operations Runbook Extension

Congratulations!

You have completed all learning cycles! You now have the fundamentals to build, deploy, and operate a web service. Try starting a real project using Agent Recipes.

Last updated: February 22, 2026 · Version: v0.0.1

Send Feedback

Opens a new issue page with your message.

Open GitHub Issue