This directory contains a production-ready Terraform module to deploy a scalable Envoy Rate Limit Service on Google Kubernetes Engine (GKE) Autopilot.
Apache Beam pipelines often process data at massive scale, which can easily overwhelm external APIs (e.g., Databases, LLM Inference endpoints, SaaS APIs).
This Terraform module deploys a centralized Rate Limit Service (RLS) using Envoy. Beam workers can query this service to coordinate global quotas across thousands of distributed workers, ensuring you stay within safe API limits without hitting 429 Too Many Requests errors.
Example Beam Pipelines using it:
9102.gcloud)APIs Enabled:
gcloud services enable container.googleapis.com compute.googleapis.com
Network Configuration:
gcloud compute routers create nat-router --network <VPC_NAME> --region <REGION> gcloud compute routers nats create nat-config \ --router=nat-router \ --region=<REGION> \ --auto-allocated-nat-external-ips \ --nat-all-subnet-ip-ranges
terraform.tfvars file to define variables specific to your environment:terraform.tfvars environment variables:project_id = "my-project-id" # GCP Project ID
region = "us-central1" # GCP Region for deployment
cluster_name = "ratelimit-cluster" # Name of the GKE cluster
deletion_protection = true # Prevent accidental cluster deletion (set "true" for prod)
control_plane_cidr = "172.16.0.0/28" # CIDR for GKE control plane (must not overlap with subnet)
ratelimit_replicas = 1 # Initial number of Rate Limit pods
min_replicas = 1 # Minimum HPA replicas
max_replicas = 5 # Maximum HPA replicas
hpa_cpu_target_percentage = 75 # CPU utilization target for HPA (%)
hpa_memory_target_percentage = 75 # Memory utilization target for HPA (%)
vpc_name = "default" # Existing VPC name to deploy into
subnet_name = "default" # Existing Subnet name (required for Internal LB IP)
ratelimit_image = "envoyproxy/ratelimit:e9ce92cc" # Docker image for Rate Limit service
redis_image = "redis:6.2-alpine" # Docker image for Redis
ratelimit_resources = { requests = { cpu = "100m", memory = "128Mi" }, limits = { cpu = "500m", memory = "512Mi" } }
redis_resources = { requests = { cpu = "250m", memory = "256Mi" }, limits = { cpu = "500m", memory = "512Mi" } }
terraform.tfvars):ratelimit_config_yaml = <<EOF
domain: mongo_cps
descriptors:
- key: database
value: users
rate_limit:
unit: second
requests_per_unit: 500
EOF
terraform init
terraform plan -out=tfplan terraform apply tfplan
terraform output load_balancer_ip
The service is accessible only from within the VPC (e.g., via Dataflow workers or GCE instances in the same network) at <INTERNAL_IP>:8081.
Test with Dataflow Workflow: Verify connectivity and rate limiting logic by running the example Dataflow pipeline.
# Get the Internal Load Balancer IP export RLS_IP=$(terraform output -raw load_balancer_ip) python sdks/python/apache_beam/examples/rate_limiter_simple.py \ --runner=DataflowRunner \ --project=<YOUR_PROJECT_ID> \ --region=<YOUR_REGION> \ --temp_location=gs://<YOUR_BUCKET>/temp \ --staging_location=gs://<YOUR_BUCKET>/staging \ --job_name=ratelimit-test-$(date +%s) \ # Point to the Terraform-provisioned Internal IP --rls_address=${RLS_IP}:8081 \ # REQUIRED: Run workers in the same private subnet --subnetwork=regions/<YOUR_REGION>/subnetworks/<YOUR_SUBNET_NAME> \ --no_use_public_ips
To destroy the cluster and all created resources:
terraform destroy
Note: If deletion_protection was enabled, you must set it to false in terraform.tfvars before destroying.
| Variable | Description | Default |
|---|---|---|
| project_id | Required Google Cloud Project ID | - |
| vpc_name | Required Existing VPC name to deploy into | - |
| subnet_name | Required Existing Subnet name | - |
| ratelimit_config_yaml | Required Rate Limit configuration content | - |
| region | GCP Region for deployment | us-central1 |
| control_plane_cidr | CIDR block for GKE control plane | 172.16.0.0/28 |
| cluster_name | Name of the GKE cluster | ratelimit-cluster |
| deletion_protection | Prevent accidental cluster deletion | false |
| ratelimit_replicas | Initial number of Rate Limit pods | 1 |
| min_replicas | Minimum HPA replicas | 1 |
| max_replicas | Maximum HPA replicas | 5 |
| hpa_cpu_target_percentage | CPU utilization target for HPA (%) | 75 |
| hpa_memory_target_percentage | Memory utilization target for HPA (%) | 75 |
| ratelimit_image | Docker image for Rate Limit service | envoyproxy/ratelimit:e9ce92cc |
| redis_image | Docker image for Redis | redis:6.2-alpine |
| ratelimit_resources | Resources for Rate Limit service (map) | requests/limits (CPU/Mem) |
| redis_resources | Resources for Redis container (map) | requests/limits (CPU/Mem) |