Design an Api Rate Limiter

Designing an API rate limiter is a common system design interview question that assesses your ability to create a scalable and reliable system to control the rate of requests to an API. Here's how you can approach this problem:

Requirements

Functional Requirements:

Limit the number of API requests a user can make within a given time period.
Return appropriate HTTP status codes (429 Too Many Requests) when the rate limit is exceeded.
Allow different rate limits for different API endpoints or user plans.

Non-Functional Requirements:

High availability and reliability.
Low latency.
Scalability to handle millions of users and requests per second.
Resilience to failures.

Key Components

Client: Makes requests to the API.
API Gateway: Routes requests to appropriate backend services and integrates the rate limiter.
Rate Limiter: Enforces rate limiting logic and tracks request counts.
Data Store: Stores rate limiting information (e.g., request counts, timestamps).

Design Steps

1. API Gateway

Use an API Gateway (e.g., AWS API Gateway, NGINX) to intercept requests before they reach the backend services. The API Gateway will check the rate limits and either forward the request or return a 429 Too Many Requests status.

2. Rate Limiter Algorithms

Implement rate limiting algorithms such as:

Fixed Window Counter: Simple but can cause burst traffic issues.
Sliding Window Log: More precise but requires more storage.
Sliding Window Counter: Balances simplicity and precision.
Token Bucket: Allows bursts of traffic while maintaining the overall rate limit.
Leaky Bucket: Smoothes out bursts by processing requests at a constant rate. For this design, let's use Token Bucket as it allows controlled bursts of traffic.

3. Data Store

Choose a high-performance, scalable data store to maintain the rate limit counters. Options include:

In-Memory Stores: Redis, Memcached (low latency, suitable for high read/write operations).
Distributed Databases: DynamoDB, Cassandra (scalable, persistent storage). For this example, we'll use Redis for its low latency and atomic operations support.

4. Rate Limiter Implementation

Step-by-Step Implementation:

Setup Redis:

Redis will store the rate limit data with keys structured as user_id:endpoint.

API Gateway Logic:

Intercept each request and pass it to the rate limiter logic.
The rate limiter checks Redis for the current count and timestamp.
Update the count and decide whether to allow or reject the request.

Token Bucket Logic in Redis:

Each user has a bucket that refills tokens at a set rate.
When a request is made, tokens are consumed from the bucket.
If the bucket has enough tokens, the request proceeds; otherwise, it is throttled.

// Redis client setup (using ioredis for Node.js)
const Redis = require("ioredis");
const redis = new Redis();
// Rate limiter function
const rateLimiter = async (userId, endpoint, maxTokens, refillRate) => {
  const key = `${userId}:${endpoint}`;
  const now = Date.now();
  // LUA script to handle token bucket algorithm in Redis
  const script = `
    local tokens_key = KEYS[1]
    local timestamp_key = KEYS[2]
    local max_tokens = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local current_time = tonumber(ARGV[3])
    local tokens = tonumber(redis.call("get", tokens_key))
    if not tokens then
      tokens = max_tokens
    end
    local last_refill = tonumber(redis.call("get", timestamp_key))
    if not last_refill then
      last_refill = current_time
    end
    local refill_tokens = math.floor((current_time - last_refill) / refill_rate)
    tokens = math.min(max_tokens, tokens + refill_tokens)
    last_refill = last_refill + (refill_tokens * refill_rate)
    if tokens > 0 then
      tokens = tokens - 1
      redis.call("set", tokens_key, tokens)
      redis.call("set", timestamp_key, last_refill)
      return 1
    else
      return 0
    end
  `;
  const tokensKey = `${key}:tokens`;
  const timestampKey = `${key}:timestamp`;
  const allowed = await redis.eval(
    script,
    2,
    tokensKey,
    timestampKey,
    maxTokens,
    refillRate,
    now
  );
  return allowed === 1;
};
// Example usage in an API endpoint
const express = require("express");
const app = express();
app.use(async (req, res, next) => {
  const userId = req.headers["x-user-id"];
  const endpoint = req.path;
  const maxTokens = 100; // e.g., 100 requests
  const refillRate = 60000; // e.g., 1 token per minute
  if (await rateLimiter(userId, endpoint, maxTokens, refillRate)) {
    next();
  } else {
    res.status(429).send("Too Many Requests");
  }
});
app.get("/api/resource", (req, res) => {
  res.send("Resource accessed");
});
app.listen(3000, () => {
  console.log("Server running on port 3000");
});

Final Thoughts

This system design leverages Redis for low-latency storage and atomic operations, allowing efficient rate limiting using the token bucket algorithm. The API Gateway ensures that all incoming requests are checked before reaching backend services. This design can scale horizontally by distributing Redis across multiple nodes and adding more instances of the API Gateway and backend services. Additionally, monitoring and logging can be added to track usage patterns and adjust rate limits as needed.