Cloudflare Workers for API Rate Limiting and Caching

I’ve built enough backend systems to know that rate limiting belongs at the edge, not in your application code. When I moved CitizenApp’s API protection to Cloudflare Workers, I cut origin server CPU by ~40% and stopped worrying about DDoS-style abuse patterns entirely.

Here’s why this matters: traditional rate limiting happens after requests traverse the internet and hit your servers. By that point, you’ve already paid for bandwidth, database connections, and compute. Cloudflare Workers intercept requests at the edge—there are 275+ data centers globally—which means you reject bad traffic before it travels to your origin. It’s not just faster; it’s cheaper and more resilient.

The Problem with Application-Level Rate Limiting

When I was working on CitizenApp’s early architecture, I implemented rate limiting using Redis + middleware. It “worked,” but:

Cold starts: Every request checked Redis. P99 latency for rate limit checks alone was 50-100ms from US regions.
Distributed complexity: I needed to sync rate limit state across multiple origin servers or risk circumventing limits by load balancing.
Fixed capacity: If traffic spiked 10x, my origin servers had to handle the decision logic for all requests, not just the ones that passed.

The real insight came when a customer’s integration script had a bug and hammered my /api/export endpoint with 50k requests/second. My origin crashed not from processing those requests, but from rejecting them. That’s backwards.

Cloudflare Workers: Rate Limiting at the Edge

Cloudflare Workers run JavaScript/TypeScript in V8 contexts at the edge. They sit between user requests and your origin, executing in ~1-5ms globally. Here’s what I built:

// Rate limiting middleware using Cloudflare KV
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const clientIp = request.headers.get("cf-connecting-ip") || "unknown";
    const rateLimitKey = `ratelimit:${clientIp}:${url.pathname}`;

    // Get current count from Cloudflare KV (geo-distributed, replicated)
    const kvNamespace = env.RATE_LIMIT_KV;
    const currentCount = await kvNamespace.get(rateLimitKey, "json") || { count: 0, resetAt: Date.now() + 60000 };

    const now = Date.now();
    
    // Reset window if expired
    if (now > currentCount.resetAt) {
      currentCount.count = 0;
      currentCount.resetAt = now + 60000;
    }

    const limit = 100; // 100 requests per minute
    
    if (currentCount.count >= limit) {
      return new Response("Rate limit exceeded", {
        status: 429,
        headers: {
          "Retry-After": String(Math.ceil((currentCount.resetAt - now) / 1000)),
          "X-RateLimit-Limit": String(limit),
          "X-RateLimit-Remaining": "0",
          "X-RateLimit-Reset": String(currentCount.resetAt),
        },
      });
    }

    // Increment counter
    currentCount.count += 1;
    await kvNamespace.put(rateLimitKey, JSON.stringify(currentCount), {
      expirationTtl: 61, // Auto-expire after window ends
    });

    // Pass through to origin
    const response = await fetch(request);
    
    // Add rate limit headers to response
    response.headers.set("X-RateLimit-Remaining", String(limit - currentCount.count));
    response.headers.set("X-RateLimit-Reset", String(currentCount.resetAt));
    
    return response;
  },
};

Why this works:

KV is geo-replicated: Cloudflare KV replicates writes across regions in seconds. Your rate limit state is consistent without explicit coordination.
Operates outside your infrastructure: Even if your origin is down, rate limiting still works.
Sub-millisecond decisions: No network latency to an external store; KV is co-located with the Worker.

Smart Caching + Rate Limiting

I went deeper and combined this with conditional caching. For endpoints returning the same data repeatedly (user profiles, config), I cache aggressively:

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    
    // Cache read-only endpoints aggressively
    if (request.method === "GET" && url.pathname.match(/^\/api\/config|\/api\/users\/\d+$/)) {
      const cacheKey = new Request(url.toString(), { method: "GET" });
      const cache = caches.default;
      
      let response = await cache.match(cacheKey);
      if (response) {
        return new Response(response.clone(), {
          headers: { "X-Cache": "HIT" },
        });
      }

      // Not cached, fetch from origin
      response = await fetch(request);
      
      if (response.status === 200) {
        // Cache for 5 minutes
        const cachedResponse = new Response(response.clone());
        cachedResponse.headers.set("Cache-Control", "public, max-age=300");
        await cache.put(cacheKey, cachedResponse);
      }
      
      return response;
    }

    // Apply rate limiting to everything
    return applyRateLimit(request, env);
  },
};

This drops origin load dramatically. On CitizenApp, user config queries went from 5k/min at origin to near-zero—90% hit the edge cache.

Handling Different Rate Limits by Tier

You’ll want different limits for free vs. paid users. I use custom headers passed from your origin during auth:

const clientIp = request.headers.get("cf-connecting-ip");
const userTier = request.headers.get("X-User-Tier") || "free"; // Set by your auth layer

const limits: Record<string, number> = {
  "free": 50,
  "pro": 500,
  "enterprise": 5000,
};

const limit = limits[userTier] || limits["free"];
const rateLimitKey = `ratelimit:${clientIp}:${userTier}`;

But wait—this requires knowing the user before the rate limit check. I solve this with a two-stage approach:

Unauthenticated routes (login, signup): Rate limit by IP, low limit
Authenticated routes: Extract token in Worker, verify signature (fast with cached public keys), then apply per-user limits

// Validate JWT at edge
const token = request.headers.get("Authorization")?.split(" ")[1];
if (token) {
  try {
    const payload = await verifyJWT(token, env.JWT_PUBLIC_KEY);
    const userId = payload.sub;
    rateLimitKey = `ratelimit:${userId}`;
  } catch {
    // Invalid token, fall back to IP-based limiting
  }
}

Gotcha: KV Write Consistency

This burned me: I assumed KV writes were instant globally. They’re not. Writes replicate within seconds, creating a race condition. If a user fires requests across multiple edge locations simultaneously, they can bypass limits briefly.

Solution: Use Cloudflare Durable Objects for truly consistent rate limiting, but it costs more and adds latency. For most use cases, the eventual consistency is acceptable—a 2-3 second window of bypass is far better than the alternative (processing the bad traffic). For strict requirements, use Durable Objects:

// Durable Object for strong consistency
export class RateLimiter {
  private counts: Map<string, { count: number; resetAt: number }> = new Map();
  
  async checkLimit(key: string, limit: number): Promise<boolean> {
    const now = Date.now();
    const entry = this.counts.get(key) || { count: 0, resetAt: now + 60000 };
    
    if (now > entry.resetAt) {
      entry.count = 0;
      entry.resetAt = now + 60000;
    }
    
    if (entry.count >= limit) return false;
    
    entry.count += 1;
    this.counts.set(key, entry);
    return true;
  }
}

Deployment: GitHub Actions + Wrangler

I deploy Workers via GitHub Actions to keep it seamless:

name: Deploy Worker
on: [push]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: "18"
      - run: npm install && npm run build
      - uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}

The Real Win

Edge rate limiting isn’t just about performance—it’s about protection. I can now adjust limits instantly, bypass bad IPs globally in seconds, and sleep knowing that even if my origin crashes, the edge still protects upstream systems.

For CitizenApp, this was a 3-hour implementation that saved weeks of infrastructure headaches. If you’re running APIs at any meaningful scale, Cloudflare Workers should be your first line of defense.

Cloudflare Workers for API Rate Limiting and Caching

Cloudflare Workers for API Rate Limiting and Caching

The Problem with Application-Level Rate Limiting

Cloudflare Workers: Rate Limiting at the Edge

Smart Caching + Rate Limiting

Handling Different Rate Limits by Tier

Gotcha: KV Write Consistency

Deployment: GitHub Actions + Wrangler

The Real Win

API Design Patterns for Cloudflare Workers: Minimal, Secure, Fast

SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload

FastAPI Lifespan Events for Multi-Tenant Resource Initialization: Setting Up Tenant Caches and AI Model Contexts Without Singleton Hell

Comments

Leave a comment