Before starting the design, we first look at the benefits of using an API rate limiter:
- Prevent resource starvation caused by Denial of Service (DoS) attack.
- Reduce cost. Limiting excess requests means fewer servers and allocating more resources to high priority APIs.
- Prevent servers from being overloaded.
Step 1 - Understand the problem and establish design scope

Summary of the requirements for the system:
- Accurately limit excessive requests.
- Low latency. The rate limiter should not slow down HTTP response time.
- Use as little memory as possible.
- Distributed rate limiting. The rate limiter can be shared across multiple servers or processes.
- Exception handling. Show clear exceptions to users when their requests are throttled.
- High fault tolerance. If there are any problems with the rate limiter (for example, a cache server goes offline), it does not affect the entire system.
Step 2 - Propose high-level design and get buy-in
Where to put the rate limiter?
-
Client-side implementation.
Generally speaking, client is an unreliable place to enforce rate limiting because client requests can easily be forged by malicious actors. Moreover, we might not have control over the client implementation.
-
Server-side implementation.

-
Middle

Cloud microservices have become widely popular and rate limiting is usually implemented within a component called API gateway. For now, we only need to know that the API gateway is a middleware that supports rate limiting.
where should the rater limiter be implemented, on the server-side or in a gateway?