rate limiter
stage 01 of 05
stage 01

Token bucket

What You Are Building

A token bucket rate limiter that supports per-key state, an injected clock, and concurrent Allow calls. By the end of this stage one process can rate-limit thousands of identities (users, IPs, API keys) from a single in-memory map without ever calling time.Sleep and without two goroutines, threads, or simultaneous promises ever combining to grant more than capacity tokens in a single instant.

Stage 1 is single-process and uses one algorithm. The Limiter interface that lets stage 2's sliding-window counter slot in next to this implementation arrives in stage 2. The distributed counter backend arrives in stage 3.

Why This Matters

Token bucket is the rate-limiting algorithm most production systems use. AWS API Gateway uses it. Stripe layers it underneath their sliding-window counter. Discord's per-route limits are token buckets. The reasons it wins:

  1. It supports bursts naturally. A bucket that has been idle for a while refills to capacity and absorbs a short flood without rejecting requests.
  2. It can compute an exact Retry-After on denial. The wait is (1 - tokens) / rate. Sliding-window approximates; token bucket is precise.
  3. The state per key is two numbers (tokens, last_refill) plus a parameter pair (capacity, rate). At a million distinct keys it is a few megabytes of memory.

The naive implementations get one of three things wrong. They refill on a background ticker (which couples capacity to scheduler jitter and makes tests fragile). They store tokens as an integer (which loses the fractional refill credit between calls and skews the rate downward). Or they lock too coarsely (which makes a hot key the global bottleneck). This stage gets all three right.

The Interface You Implement

Stage 1 ships the token bucket type, the Decision shape, and an injected clock option. Every stage from here on returns the same Decision shape.

Go (ratelimit/ratelimit.go):

package ratelimit
 
import (
    "context"
    "time"
)
 
type Decision struct {
    Allowed    bool
    RetryAfter time.Duration
    Remaining  int
}
 
type TokenBucket struct { /* unexported */ }
 
type Option func(*config)
 
// WithClock injects a clock function for tests. Defaults to time.Now.
func WithClock(now func() time.Time) Option
 
// NewTokenBucket returns a token bucket with the given capacity and refill rate.
func NewTokenBucket(capacity int, refillPerSec float64, opts ...Option) *TokenBucket
 
// Allow returns the decision for one unit-cost call against the given key.
func (tb *TokenBucket) Allow(ctx context.Context, key string) (Decision, error)

Python (ratelimit/__init__.py):

from dataclasses import dataclass
from typing import Callable
 
@dataclass
class Decision:
    allowed: bool
    retry_after: float  # seconds
    remaining: int
 
class TokenBucket:
    def __init__(
        self,
        capacity: int,
        refill_per_sec: float,
        *,
        clock: Callable[[], float] | None = None,
    ) -> None: ...
    def allow(self, key: str) -> Decision: ...

clock returns seconds since an arbitrary epoch (monotonic in production, advanced manually in tests).

TypeScript (src/ratelimit.ts):

export interface Decision {
  allowed: boolean
  retryAfterMs: number
  remaining: number
}
 
export interface LimiterOptions {
  clock?: () => number // milliseconds since epoch
}
 
export class TokenBucket {
  constructor(capacity: number, refillPerSec: number, opts?: LimiterOptions)
  allow(key: string): Decision
}

retryAfterMs is milliseconds; the Go and Python variants use seconds with sub-second precision. Same information, language-idiomatic units.

What the Tests Check

Five tests gate this stage. Each catches a specific failure mode.

  • FullBucketAllowsCapacity. With capacity = 10, the first ten consecutive Allow calls return allowed=true. The eleventh returns allowed=false with retry_after > 0. Catches a missing or off-by-one capacity check.
  • DenialCarriesRetryAfter. Every denial returns a positive retry_after. Catches an implementation that returns false with retry_after=0, which would push clients into a tight retry loop.
  • RefillRestoresTokens. After capacity is drained, advancing the injected clock by 1 / rate seconds restores exactly one token. The next Allow succeeds. Catches missing lazy refill and integer truncation of fractional tokens.
  • PerKeyIsolated. Two distinct keys do not share state. Draining key a does not affect key b. Catches an accidental global counter.
  • ConcurrentAllowGrantsAtMostCapacity. 100 goroutines / threads / promises race to call Allow on the same key, against an empty bucket pre-filled with capacity = 50. Exactly 50 of them get allowed=true, never more. Catches a missing atomic CAS, a missing mutex, or a check-then-act race.

Run mise run stage 1. The concurrent test runs under -race in Go and is informative-only in Python.

Guidance

Lazy refill, not a background ticker

The wrong implementation runs a background goroutine that adds tokens to every bucket at 1 / rate second intervals. Three problems:

  1. It scales with the number of distinct keys, not with the number of Allow calls.
  2. The exact refill timing depends on scheduler jitter, which makes the RefillRestoresTokens test flaky.
  3. The background goroutine must be carefully cleaned up or it leaks.

The right implementation does lazy refill on the Allow call:

elapsed = clock() - last_refill
new_tokens = elapsed * rate
tokens = min(capacity, tokens + new_tokens)
last_refill = clock()

No background work, no scheduler dependency, no leak.

Fractional tokens are required

If you store tokens as an integer, the lazy refill silently rounds down between calls. With rate = 10/sec and Allow calls 50 milliseconds apart, each refill adds 0.5 tokens, which truncates to zero. The effective rate drops to zero and the bucket never refills.

Store tokens as a float64 (Go), float (Python), or number (TypeScript). The Decision.remaining field can still be an integer; cast on the way out.

Atomic CAS or a mutex, your call

For concurrent Allow calls against the same key you need to make read-decide-write atomic. Two paths:

  • Mutex per key. Simple. A sync.Mutex (Go), threading.Lock (Python), or no-op (single-threaded JavaScript). Held only for the duration of the lazy refill plus the token deduction. Microseconds. Fine for most workloads.
  • Compare-and-swap on (tokens, last_refill) as a single word. Faster under high contention on a single key. More complex; only worth it if your profile shows the mutex is a bottleneck.

Karnstack tests do not care which you pick. Pick the simpler one.

A note on JavaScript concurrency

In single-threaded JavaScript runtimes, Allow is naturally atomic since no other code can run between the read and the write. The ConcurrentAllowGrantsAtMostCapacity test in the TypeScript starter uses Promise.all over a hundred allow calls; the test still passes without a lock because the event loop serializes them.

If you ship this code in a worker pool or share state across processes with SharedArrayBuffer, you need real atomics. The stage-3 distributed counter backend addresses the multi-process case.

Common Pitfalls

  1. The RefillRestoresTokens test fails by exactly one token. You stored tokens as an integer. Switch to floating point and only cast to integer in remaining.

  2. The DenialCarriesRetryAfter test fails because retry_after = 0 on denial. You computed retry_after = (capacity - tokens) / rate instead of (1 - tokens) / rate. The wait is the time to refill enough for the next single call, not enough to fully top up.

  3. The ConcurrentAllowGrantsAtMostCapacity test fails with the count drifting above capacity (e.g. 53 out of 100). Classic check-then-act. Your code reads tokens, checks >= 1, then writes tokens - 1. Between the read and the write another goroutine did the same. Wrap the read-decide-write in a mutex or use a CAS loop.

  4. The PerKeyIsolated test fails because all keys share state. You forgot the per-key map. The bucket holds map[string]*bucketState, not a single bucketState. The map itself needs synchronization (a top-level sync.RWMutex or a sync.Map), separate from the per-key mutex.

  5. mise run stage 1 passes locally but flakes on CI. Your test uses time.Sleep instead of advancing the injected clock. Replace the sleep with clock.Advance(d) on the test's FakeClock.

Bonus

  • Add an optional cost parameter to Allow so a non-unit operation (a multi-record write that should count as ten) can deduct ten tokens. Tests do not exercise this; production uses it heavily.
  • Replace the per-key map with an LRU bounded at maxKeys to prevent unbounded growth when distinct identifiers churn. Stage 5 mentions this; you can land it here.
  • Benchmark Allow against a hot key. A clean mutex implementation runs in well under one microsecond on a modern laptop. The atomic-CAS version is roughly twice as fast under heavy contention.

What You Built

A token bucket with per-key state, lazy refill, atomic decisioning, and a clean Decision return shape. The whole implementation is around a hundred lines. The next stage introduces a second algorithm, the sliding-window counter, and pulls both behind a common Limiter interface so the same test suite runs against either.

Continue to Stage 2: Sliding-Window Counter.

Command Palette

Search for a command to run...