tree: 3cb604441de5509021cb320a571275e764361e05 [path history] [tgz]
  1. gke.tf
  2. network.tf
  3. outputs.tf
  4. prerequisites.tf
  5. provider.tf
  6. ratelimit.tf
  7. README.md
  8. terraform.tfvars
  9. variables.tf
examples/terraform/envoy-ratelimiter/README.md

Envoy Rate Limiter on GKE (Terraform)

This directory contains a production-ready Terraform module to deploy a scalable Envoy Rate Limit Service on Google Kubernetes Engine (GKE) Autopilot.

Overview

Apache Beam pipelines often process data at massive scale, which can easily overwhelm external APIs (e.g., Databases, LLM Inference endpoints, SaaS APIs).

This Terraform module deploys a centralized Rate Limit Service (RLS) using Envoy. Beam workers can query this service to coordinate global quotas across thousands of distributed workers, ensuring you stay within safe API limits without hitting 429 Too Many Requests errors.

Example Beam Pipelines using it:

Architectures:

  • GKE Autopilot: Fully managed, serverless Kubernetes environment.
    • Private Cluster: Nodes have internal IPs only.
    • Cloud NAT (Prerequisite): Allows private nodes to pull Docker images.
  • Envoy Rate Limit Service: A stateless Go/gRPC service that handles rate limit logic.
  • Redis: Stores the rate limit counters.
  • StatsD Exporter: Sidecar container that converts StatsD metrics to Prometheus format, exposed on port 9102.
  • Internal Load Balancer: A Google Cloud TCP Load Balancer exposing the Rate Limit service internally within the VPC.

Prerequisites:

Following items need to be setup for Envoy Rate Limiter deployment on GCP:

  1. GCP project

  2. Tools Installed:

  3. APIs Enabled:

    gcloud services enable container.googleapis.com compute.googleapis.com
    
  4. Network Configuration:

    • Cloud NAT: Must exist in the region to allow Private Nodes to pull images and reach external APIs. Follow this for more details. Helper Command (if you need to create one):
      gcloud compute routers create nat-router --network <VPC_NAME> --region <REGION>
      gcloud compute routers nats create nat-config \
          --router=nat-router \
          --region=<REGION> \
          --auto-allocated-nat-external-ips \
          --nat-all-subnet-ip-ranges
      
    • Validation via Console:
      1. Go to Network Services > Cloud NAT in the Google Cloud Console.
      2. Verify a NAT Gateway exists for your Region and VPC Network.
      3. Ensure it is configured to apply to Primary and Secondary ranges (or at least the ranges GKE will use).

Prepare deployment configuration:

  1. Update the terraform.tfvars file to define variables specific to your environment:
  • terraform.tfvars environment variables:
project_id            = "my-project-id"             # GCP Project ID
region                = "us-central1"               # GCP Region for deployment
cluster_name          = "ratelimit-cluster"         # Name of the GKE cluster
deletion_protection   = true                        # Prevent accidental cluster deletion (set "true" for prod)
control_plane_cidr    = "172.16.0.0/28"             # CIDR for GKE control plane (must not overlap with subnet)
ratelimit_replicas    = 1                           # Initial number of Rate Limit pods
min_replicas          = 1                           # Minimum HPA replicas
max_replicas          = 5                           # Maximum HPA replicas
hpa_cpu_target_percentage        = 75                          # CPU utilization target for HPA (%)
hpa_memory_target_percentage     = 75                          # Memory utilization target for HPA (%)
vpc_name              = "default"                   # Existing VPC name to deploy into
subnet_name           = "default"                   # Existing Subnet name (required for Internal LB IP)
ratelimit_image       = "envoyproxy/ratelimit:e9ce92cc" # Docker image for Rate Limit service
redis_image           = "redis:6.2-alpine"          # Docker image for Redis
ratelimit_resources   = { requests = { cpu = "100m", memory = "128Mi" }, limits = { cpu = "500m", memory = "512Mi" } }
redis_resources       = { requests = { cpu = "250m", memory = "256Mi" }, limits = { cpu = "500m", memory = "512Mi" } }
  • Custom Rate Limit Configuration (Must override in terraform.tfvars):
ratelimit_config_yaml = <<EOF
domain: mongo_cps
descriptors:
  - key: database
    value: users
    rate_limit:
      unit: second
      requests_per_unit: 500
EOF

Deploy Envoy Rate Limiter:

  1. Initialize Terraform to download providers and modules:
terraform init
  1. Plan and apply the changes:
terraform plan -out=tfplan
terraform apply tfplan
  1. Connect to the service: After deployment, get the Internal IP address:
terraform output load_balancer_ip

The service is accessible only from within the VPC (e.g., via Dataflow workers or GCE instances in the same network) at <INTERNAL_IP>:8081.

  1. Test with Dataflow Workflow: Verify connectivity and rate limiting logic by running the example Dataflow pipeline.

    # Get the Internal Load Balancer IP
    export RLS_IP=$(terraform output -raw load_balancer_ip)
    
    python sdks/python/apache_beam/examples/rate_limiter_simple.py \
      --runner=DataflowRunner \
      --project=<YOUR_PROJECT_ID> \
      --region=<YOUR_REGION> \
      --temp_location=gs://<YOUR_BUCKET>/temp \
      --staging_location=gs://<YOUR_BUCKET>/staging \
      --job_name=ratelimit-test-$(date +%s) \
      # Point to the Terraform-provisioned Internal IP
      --rls_address=${RLS_IP}:8081 \
      # REQUIRED: Run workers in the same private subnet
      --subnetwork=regions/<YOUR_REGION>/subnetworks/<YOUR_SUBNET_NAME> \
      --no_use_public_ips
    

Clean up resources:

To destroy the cluster and all created resources:

terraform destroy

Note: If deletion_protection was enabled, you must set it to false in terraform.tfvars before destroying.

Variables description:

VariableDescriptionDefault
project_idRequired Google Cloud Project ID-
vpc_nameRequired Existing VPC name to deploy into-
subnet_nameRequired Existing Subnet name-
ratelimit_config_yamlRequired Rate Limit configuration content-
regionGCP Region for deploymentus-central1
control_plane_cidrCIDR block for GKE control plane172.16.0.0/28
cluster_nameName of the GKE clusterratelimit-cluster
deletion_protectionPrevent accidental cluster deletionfalse
ratelimit_replicasInitial number of Rate Limit pods1
min_replicasMinimum HPA replicas1
max_replicasMaximum HPA replicas5
hpa_cpu_target_percentageCPU utilization target for HPA (%)75
hpa_memory_target_percentageMemory utilization target for HPA (%)75
ratelimit_imageDocker image for Rate Limit serviceenvoyproxy/ratelimit:e9ce92cc
redis_imageDocker image for Redisredis:6.2-alpine
ratelimit_resourcesResources for Rate Limit service (map)requests/limits (CPU/Mem)
redis_resourcesResources for Redis container (map)requests/limits (CPU/Mem)