Deploy Cloud Run Service
Deploy containerized HTTP services to Cloud Run using Docker and Kustomize.
Trigger
Use this skill when the user asks to:
- •Deploy a Cloud Run Service (web server, API)
- •Create an always-running HTTP service
- •Set up Kustomize for Cloud Run Service
- •Configure health checks for Cloud Run
Project Structure
code
projects/{service_name}/
├── pyproject.toml # Dependencies and brick references
├── Dockerfile # Container definition
└── main.py # Entry point (optional)
infrastructure/cloudrun/{service_name}/
├── base/
│ ├── service.yaml # Base service definition
│ └── kustomization.yaml
└── overlays/
├── sandbox/
│ └── kustomization.yaml
└── production/
└── kustomization.yaml
Procedure: Creating New Cloud Run Service
Step 1: Create Dockerfile
Similar to Cloud Run Job, but the application must expose an HTTP port:
dockerfile
FROM python:3.13-slim@sha256:{digest}
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends ca-certificates curl
ADD https://astral.sh/uv/0.7.8/install.sh /uv-installer.sh
RUN sh /uv-installer.sh && rm /uv-installer.sh
ENV PATH="/root/.local/bin/:$PATH"
WORKDIR /app
COPY projects/{service_name}/pyproject.toml ./
COPY uv.lock ./
RUN uv sync --frozen --no-default-groups --no-install-project
COPY components/{namespace}/logging {namespace}/logging
COPY components/{namespace}/settings {namespace}/settings
COPY bases/{namespace}/{base_name} {namespace}/{base_name}
RUN mv {namespace}/{base_name}/core.py main.py
ENV PATH="/app/.venv/bin:$PATH"
ENV PORT=8080
EXPOSE 8080
CMD ["python", "main.py"]
Step 2: Create Base Service Manifest
Create infrastructure/cloudrun/{service_name}/base/service.yaml:
yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: {service_name}
labels:
component: my-component
annotations:
run.googleapis.com/ingress: all
spec:
template:
metadata:
annotations:
run.googleapis.com/cpu-throttling: "false"
run.googleapis.com/startup-cpu-boost: "true"
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "2"
spec:
serviceAccountName: {service_account}@PROJECT_ID.iam.gserviceaccount.com
timeoutSeconds: 300
containers:
- name: {service_name}
image: IMAGE_URL
ports:
- containerPort: 8080
name: http1
env:
- name: ENVIRONMENT
value: ENVIRONMENT_PLACEHOLDER
- name: VERSION
value: VERSION_PLACEHOLDER
startupProbe:
httpGet:
path: /health_check
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /health_check
port: 8080
resources:
limits:
cpu: "1"
memory: 2Gi
traffic:
- latestRevision: true
percent: 100
Step 3: Create Kustomization Files
Base kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - service.yaml
Production overlay:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: {gcp_project_id}
resources:
- ../../base
images:
- name: IMAGE_URL
newName: IMAGE_URL_PLACEHOLDER
Procedure: Deployment
bash
# Build and push image
docker build -f projects/{service_name}/Dockerfile -t {image_url}:{version} .
docker push {image_url}:{version}
# Update and deploy
cd infrastructure/cloudrun/{service_name}/overlays/production
kustomize edit set image IMAGE_URL={image_url}:{version}
kustomize build . | sed "s|VERSION_PLACEHOLDER|{version}|g" > /tmp/service.yaml
gcloud run services replace /tmp/service.yaml --region=asia-southeast1
Key Differences from Jobs
| Aspect | Cloud Run Service | Cloud Run Job |
|---|---|---|
| API Version | serving.knative.dev/v1 | run.googleapis.com/v1 |
| Kind | Service | Job |
| Ports | Required | None |
| Health Probes | Required | None |
| Autoscaling | minScale, maxScale | N/A |
| Traffic | Required (traffic section) | N/A |
| Deploy Command | gcloud run services replace | gcloud run jobs replace |
Startup Probe Configuration (Critical)
Aggressive settings cause deployment failures:
yaml
startupProbe:
httpGet:
path: /health_check
port: 8080
initialDelaySeconds: 5 # Wait before first probe
timeoutSeconds: 5 # Allow 5s for response
periodSeconds: 10 # Probe every 10s
successThreshold: 1 # 1 success to pass
failureThreshold: 3 # Allow 3 failures (~35s window)
Avoid:
- •
failureThreshold: 1- Fails on first probe failure - •
initialDelaySeconds: 0- Probes before app starts - •
timeoutSeconds: 1- Too short for slow init