cloud-run-release-operator

module
v0.0.0-...-a524873 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2020 License: Apache-2.0

README

Cloud Run Release Manager

The Cloud Run Release Manager provides an automated way to gradually roll out new versions of your Cloud Run services. By using metrics, it automatically decides to slowly increase traffic to a new version or roll back to the previous one.

Disclaimer: This project is not an official Google product and is provided as-is.

You might encounter issues in production, since this project is currently in alpha.

Quick Links:

How does it work?

Cloud Run Release Manager periodically checks for new revisions in the services that opted-in for gradual rollouts. If a new revision with no traffic is found, the Release Manager automatically assigns it some initial traffic. This new revision is labeled candidate while the previous revision serving traffic is labeled stable.

Depending on the candidate's health, traffic to the candidate is increased or traffic to the candidate is dropped and is redirected to the stable revision.

Examples
Scenario 1: Automated Rollouts
  1. I have version v1 of an application deployed to Cloud Run
  2. I deploy a new version, v2, to Cloud Run with --no-traffic option (gets 0% of the traffic)
  3. The new version is automatically detected and assigned 5% of the traffic
  4. Every minute, metrics for v2 in the last 30 minutes are retrieved. Metrics show a "healthy" version and traffic to v2 is increased to 30% only after 30 minutes have passed since last update
  5. Metrics show a "healthy" version again and traffic to v2 is increased to 50% only after 30 minutes have passed since last update
  6. The process is repeated until the new version handles all the traffic and becomes stable

Rollout stages

Scenario 2: Automated Rollbacks
  1. I have version v1 of an application deployed to Cloud Run
  2. I deploy a new version, v2, to Cloud Run with --no-traffic option (gets 0% of the traffic)
  3. The new version is automatically detected and assigned 5% of the traffic
  4. Every minute, metrics for v2 in the last 30 minutes are retrieved. Metrics show a "healthy" version and traffic to v2 is increased to 30% only after 30 minutes have passed since last update
  5. Metrics for v2 are retrieved one more time and show an "unhealthy" version. Traffic to v2 is inmediately dropped, and all traffic is redirected to v1

Rollout stages

Try it out (locally)

  1. Check out this repository.

  2. Make sure you have Go compiler installed, run:

    go build -o cloud_run_release_manager ./cmd/operator
    
  3. To start the program, run:

    ./cloud_run_release_manager -cli -project=<YOUR_PROJECT>
    

Once you run this command, it will check the health of Cloud Run services with the label rollout-strategy=gradual every minute by looking at the candidate's metrics for the past 30 minutes by default.

  • The health is determined using the metrics and configured health criteria
  • By default, the only health criteria is a expected max server error rate of 1%
  • If metrics show a healthy candidate, traffic to candidate is increased
  • If metrics show an unhealthy candidate, a roll back is performed.

Setup

Cloud Run Release Manager is distributed as a server deployed to Cloud Run, invoked periodically by Cloud Scheduler.

To set up this on Cloud Run, run the following steps on your shell:

  1. Set your project ID in a variable:

    PROJECT_ID=<your-project>
    
  2. Create a new service account:

    gcloud iam service-accounts create release-manager \
        --display-name "Cloud Run Release Manager"
    

    Give it permissions to manage your services on the Cloud Run API:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member=serviceAccount:release-manager@${PROJECT_ID}.iam.gserviceaccount.com \
        --role=roles/run.admin
    

    Also, give it permissions to use other service accounts as its identity when updating Cloud Run services:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member=serviceAccount:release-manager@${PROJECT_ID}.iam.gserviceaccount.com \
        --role=roles/iam.serviceAccountUser
    

    Finally, give it access to metrics on your services:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member=serviceAccount:release-manager@${PROJECT_ID}.iam.gserviceaccount.com \
         --role=roles/monitoring.viewer
    
  3. Build and push the container image for Release Manager to Google container Registry.

    git clone https://github.com/GoogleCloudPlatform/cloud-run-release-manager.git
    
    gcloud builds submit ./cloud-run-release-manager \
        -t gcr.io/$PROJECT_ID/cloud-run-release-manager
    
  4. Deploy the Release Manager as a Cloud Run service:

    gcloud run deploy release-manager --quiet \
        --platform=managed \
        --region=us-central1 \
        --image=gcr.io/$PROJECT_ID/cloud-run-release-manager \
        --service-account=release-manager@${PROJECT_ID}.iam.gserviceaccount.com \
        --args=-project=$PROJECT_ID --args=-verbosity=debug
    
  5. Find the URL of your Cloud Run service and set as URL variable:

    URL=$(gcloud run services describe release-manager \
        --platform=managed --region=us-central1 \
        --format='value(status.url)')
    
  6. Set up a Cloud Scheduler job to call the Release Manager (deployed on Cloud Run) every minute:

    gcloud services enable cloudscheduler.googleapis.com
    
    gcloud beta scheduler jobs create http cloud-run-release-manager --schedule "* * * * *" \
        --http-method=GET \
        --uri="${URL}/rollout" \
        --oidc-service-account-email=release-manager@${PROJECT_ID}.iam.gserviceaccount.com \
        --oidc-token-audience="${URL}/rollout"
    

At this point, you can start deploying services with label rollout-strategy=gradual and deploy new revisions with --no-traffic option and the Release Manager will slowly roll it out. See this section for more details.

Configuration

Currently, all the configuration arguments must be specified using command line flags:

Choosing services

Cloud Run Release Manager can manage the rollout of multiple services at the same time.

To opt-in a service for automated rollouts and rollbacks, the service must have the configured label selector. By default, services with the label rollout-strategy=gradual are looked for in all regions.

Note: A project ID must be specified.

  • -project: Google Cloud project ID that has the Cloud Run services deployed
  • -regions: Regions where to look for opted-in services (default: all available Cloud Run regions)
  • -label: The label selector to match to the opted-in services (default: rollout-strategy=gradual)
Rollout strategy

The rollout strategy consists of the steps and health criteria.

  • -cli-run-interval: The time between each health check (default: 60s). This is only need it if running with -cli option.
  • -healthcheck-offset: Time window to look back during health check to assess the candidate revision's health (default: 30m).
  • -min-requests: The minimum number of requests needed to determine the candidate's health (default: 100). This minimum value is expected in the time window determined by -healthcheck-offset
  • -min-wait: The minimum time before rolling out further (default: 30m)
  • -steps: Percentages of traffic the candidate should go through (default: 5,20,50,80)
  • -max-error-rate: Expected maximum rate (in percent) of server errors (default: 1)
  • -latency-p99: Expected maximum latency for 99th percentile of requests, 0 to ignore (default: 0)
  • -latency-p95: Expected maximum latency for 95th percentile of requests, 0 to ignore (default: 0)
  • -latency-p50: Expected maximum latency for 50th percentile of requests, 0 to ignore (default: 0)

This is not an official Google project. See LICENSE.

Directories

Path Synopsis
cmd
internal
metrics/sheets
Package sheets provides a metrics provider implementation that retrieves metrics from a publicly-available Google Sheets The document must have the following values, starting at row 2, in the following order.
Package sheets provides a metrics provider implementation that retrieves metrics from a publicly-available Google Sheets The document must have the following values, starting at row 2, in the following order.
run

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL