Go Standard Library Rate Limiter Design and Implementation

Rate limiter is a very important component in backend services, and it is widely used in practical business scenarios. Its design is often encountered in microservices, gateways, and some backend services. The role of the rate limiter is to limit the rate of requests and protect the backend response service to prevent the occurrence of service unavailability due to service overload.

There are many implementation methods for rate limiters, such as Token Bucket, Sliding Window, and Leaky Bucket.

In the Golang library, the official implementation of the rate limiter is provided by golang.org/x/time/rate, which is designed based on the Token Bucket algorithm.

Token Bucket Algorithm#

The token bucket design is relatively simple and can be understood as a refrigerator that can only hold a fixed number of ice creams. Each request can be understood as a person who comes to get ice cream, and can only take one piece of ice cream each time. What happens when the ice cream is gone? There will be a worker who puts ice cream in the refrigerator at a consistent frequency, for example, he can only put 10 pieces of ice cream in the refrigerator in 1 second. This can show the frequency of request responses.

Token Bucket Design Concept:

Token: Each request can only continue to access after obtaining the Token.
Bucket: A bucket with a fixed number of tokens that can be put into it.
Token Injection Frequency: Put tokens into the bucket at a fixed frequency, and the number of tokens put in cannot exceed the capacity of the bucket.

In other words, the token bucket design algorithm restricts the rate of requests, achieves controllable request response, especially for the phenomenon of burst traffic requests in high-concurrency scenarios, the backend can easily handle requests because burst traffic requests have been rate-limited before reaching the specific backend service.

Specific Design#

Rate Limiter Definition#

type Limiter struct {
	mu        sync.Mutex // Mutex (exclusive lock)
	limit     Limit      // Rate at which tokens are put into the bucket, float64 type
	burst     int        // Size of the bucket
	tokens    float64    // Current number of remaining tokens
	last      time.Time  // Time when the last token was taken
	lastEvent time.Time  // Time of the last rate limiting event
}

limit, burst, and tokens are the core parameters in this rate limiter, and the size of concurrent requests is implemented here.

After the tokens are issued, they will be stored in the Reservation object:

type Reservation struct {
	ok        bool      // Whether the conditions for allocating tokens are met
	lim       *Limiter  // Rate limiter for token issuance
	tokens    int       // Number of tokens issued
	timeToAct time.Time // Time when the token issuance conditions are met
	limit     Limit     // Rate of token issuance
}

Consuming Tokens#

The Limiter provides three methods for users to consume tokens. Users can consume one token at a time or consume multiple tokens at once. Each method represents different corresponding measures when tokens are insufficient.

Wait, WaitN#

func (lim *Limiter) Wait(ctx context.Context) (err error)
func (lim *Limiter) WaitN(ctx context.Context, n int) (err error)

Among them, Wait is equivalent to WaitN(ctx, 1), which is the same in the implementation of the following methods.

When consuming tokens using the Wait method, if the number of tokens in the bucket is insufficient (less than n), the Wait method will block for a period of time until the tokens meet the conditions. If there are enough tokens, it will return directly.

Allow, AllowN#

func (lim *Limiter) Allow() bool
func (lim *Limiter) AllowN(now time.Time, n int) bool

The AllowN method indicates whether the number of tokens in the bucket is at least n at a certain point in time. If it meets the condition, it returns true and consumes n tokens from the bucket. Otherwise, it returns false without consuming tokens.

This usually corresponds to online scenarios. If the request rate is too fast, some requests are directly discarded.

Reserve, ReserveN#

The official rate limiter provides both blocking wait methods (Wait) and direct judgment methods (Allow), as well as self-maintained reservation methods, but the core implementation is the reserveN method below.

func (lim *Limiter) Reserve() *Reservation
func (lim *Limiter) ReserveN(now time.Time, n int) *Reservation

After calling these methods, a Reservation object will be returned regardless of whether the tokens are sufficient or not.

You can call the Delay() method of the object, which returns the waiting time. If the waiting time is 0, it means no waiting is required. You must wait until the waiting time is over before proceeding to the next task.

Alternatively, if you don't want to wait, you can call the Cancel() method, which will return the token.

func (lim *Limiter) reserveN(now time.Time, n int, maxFutureReserve time.Duration) Reservation {
	lim.mu.Lock()

	// First, check if the injection rate is infinite
	// If it is infinite, it means that there is currently no rate limiting
	if lim.limit == Inf {
		lim.mu.Unlock()
		return Reservation{
			ok:        true,
			lim:       lim,
			tokens:    n,
			timeToAct: now,
		}
	}

	// Get the number of tokens that can be obtained at the time of now and the time when the last token was taken
	now, last, tokens := lim.advance(now)

	// Update the number of tokens
	tokens -= float64(n)

	// If tokens are negative, it means that there are no tokens in the bucket
	// It means that you need to wait and calculate the waiting time
	var waitDuration time.Duration
	if tokens < 0 {
		waitDuration = lim.limit.durationFromTokens(-tokens)
	}

	// Calculate whether the allocation conditions are met
	// 1. The size to be allocated does not exceed the size of the bucket
	// 2. The waiting time does not exceed the set waiting time
	ok := n <= lim.burst && waitDuration <= maxFutureReserve

	// Preprocess the reservation
	r := Reservation{
		ok:    ok,
		lim:   lim,
		limit: lim.limit,
	}
	// If the allocation conditions are met
	// 1. Set the allocation size
	// 2. The time when the token is issued = current time + waiting time
	if ok {
		r.tokens = n
		r.timeToAct = now.Add(waitDuration)
	}

	// Update the values of the limiter and return
	if ok {
		lim.last = now
		lim.tokens = tokens
		lim.lastEvent = r.timeToAct
	} else {
		lim.last = last
	}

	lim.mu.Unlock()
	return r
}

Specific Usage#

The rate package provides the usage of the rate limiter. You only need to specify the limit (rate at which tokens are put into the bucket) and burst (size of the bucket).

func NewLimiter(r Limit, b int) *Limiter {
	return &Limiter{
		limit: r, // Rate at which tokens are put into the bucket
		burst: b, // Size of the bucket
	}
}

Here, I use an HTTP API to verify the power of time/rate:

func main() {
	r := rate.Every(1 * time.Millisecond)
	limit := rate.NewLimiter(r, 10)
	http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
		if limit.Allow() {
			fmt.Printf("Request successful, current time: %s\n", time.Now().Format("2006-01-02 15:04:05"))
		} else {
			fmt.Printf("Request successful, but rate limited...\n")
		}
	})

	_ = http.ListenAndServe(":8081", nil)
}

Here, I set the bucket to put tokens every millisecond, with a capacity of 10. I start an HTTP service to simulate a backend API.

Next, let's do a stress test to see the effect:

func GetApi() {
	api := "http://localhost:8081/"
	res, err := http.Get(api)
	if err != nil {
		panic(err)
	}
	defer res.Body.Close()

	if res.StatusCode == http.StatusOK {
		fmt.Printf("Get API success\n")
	}
}

func Benchmark_Main(b *testing.B) {
	for i := 0; i < b.N; i++ {
		GetApi()
	}
}

The result is as follows:

......
Request successful, current time: 2020-08-24 14:26:52
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, current time: 2020-08-24 14:26:52
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, but rate limited...
Request successful, but rate limited...
......

Here, it can be seen that when using the AllowN method, only when the token is produced, can the token be consumed and the request can continue. The remaining tokens are discarded. Of course, in actual business processing, it can be feedback to the frontend in a more friendly way.

Here, the first few requests will succeed because after the service starts, the token bucket will be initialized and tokens will be put into the bucket. However, with the burst traffic requests, tokens will be produced at the predetermined rate, resulting in a clear shortage of tokens.

Open Source Repository#

Currently, time/rate is an independent open source solution for rate limiters. Interested friends can give this project a Star, thank you.

golang/time

[mirror] Go supplementary time packages

Go386120

Debug客栈