Rate limiting

Configure token bucket rate limits on MCPServer and VirtualMCPServer resources to control how many tool invocations users can make. Rate limiting prevents individual users from monopolizing shared servers and protects downstream services from traffic spikes.

ToolHive supports two scopes of rate limiting:

Shared limits cap total requests across all users.
Per-user limits cap requests independently for each authenticated user.

Both scopes can be applied at the server level and overridden per tool. A request must pass all applicable limits to proceed.

Prerequisites

Before you begin, ensure you have:

A Kubernetes cluster with the ToolHive Operator installed
Redis deployed in your cluster: rate limiting stores token bucket counters in Redis (see Redis Sentinel session storage for deployment instructions)
For per-user limits: authentication enabled on the MCPServer (oidcConfigRef or externalAuthConfigRef)

If you need help with these prerequisites, see:

How rate limiting works

Rate limits use a token bucket algorithm. Each bucket has a capacity (maxTokens) and a refill period (refillPeriod). The bucket starts full and each tools/call request consumes one token. When the bucket is empty, requests are rejected until tokens refill. The refill rate is maxTokens / refillPeriod tokens per second.

Only tools/call requests are rate-limited. Lifecycle methods (initialize, ping) and discovery methods (tools/list, prompts/list) pass through unconditionally.

When a request is rejected, the proxy returns:

HTTP 429 with a Retry-After header (seconds until a token is available)
A JSON-RPC error with code -32029 and retryAfterSeconds in the error data

If Redis is unreachable, rate limiting fails open and all requests are allowed through.

Configure shared rate limits

Shared limits apply a single token bucket across all users. Use them to cap total throughput to protect downstream services.

mcpserver-shared-ratelimit.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: weather-server
spec:
  image: ghcr.io/stackloklabs/weather-mcp/server
  transport: streamable-http
  sessionStorage:
    provider: redis
    address: <YOUR_REDIS_ADDRESS>
  rateLimiting:
    shared:
      maxTokens: 1000
      refillPeriod: 1m0s

This allows 1,000 total tools/call requests per minute across all users.

Configure per-user rate limits

Per-user limits give each authenticated user their own independent token bucket. This prevents a single user from consuming the entire server capacity.

Per-user limits require authentication to be enabled. The proxy identifies users by the sub claim from their JWT token.

mcpserver-peruser-ratelimit.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPOIDCConfig
metadata:
  name: ratelimit-oidc
  namespace: toolhive-system
spec:
  type: inline
  inline:
    issuer: https://my-idp.example.com
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: weather-server
spec:
  image: ghcr.io/stackloklabs/weather-mcp/server
  transport: streamable-http
  oidcConfigRef:
    name: ratelimit-oidc
    audience: my-audience
  sessionStorage:
    provider: redis
    address: <YOUR_REDIS_ADDRESS>
  rateLimiting:
    perUser:
      maxTokens: 100
      refillPeriod: 1m0s

This allows each user 100 tools/call requests per minute independently.

Combine shared and per-user limits

You can configure both scopes together. A request must pass all applicable limits. This lets you set a per-user ceiling while also capping total server throughput.

mcpserver-combined-ratelimit.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPOIDCConfig
metadata:
  name: ratelimit-oidc
  namespace: toolhive-system
spec:
  type: inline
  inline:
    issuer: https://my-idp.example.com
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: weather-server
spec:
  image: ghcr.io/stackloklabs/weather-mcp/server
  transport: streamable-http
  oidcConfigRef:
    name: ratelimit-oidc
    audience: my-audience
  sessionStorage:
    provider: redis
    address: <YOUR_REDIS_ADDRESS>
  rateLimiting:
    shared:
      maxTokens: 1000
      refillPeriod: 1m0s
    perUser:
      maxTokens: 100
      refillPeriod: 1m0s

Add per-tool overrides

Individual tools can have tighter limits than the server default. Per-tool limits are enforced in addition to server-level limits.

mcpserver-pertool-ratelimit.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPOIDCConfig
metadata:
  name: ratelimit-oidc
  namespace: toolhive-system
spec:
  type: inline
  inline:
    issuer: https://my-idp.example.com
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: weather-server
spec:
  image: ghcr.io/stackloklabs/weather-mcp/server
  transport: streamable-http
  oidcConfigRef:
    name: ratelimit-oidc
    audience: my-audience
  sessionStorage:
    provider: redis
    address: <YOUR_REDIS_ADDRESS>
  rateLimiting:
    perUser:
      maxTokens: 100
      refillPeriod: 1m0s
    tools:
      - name: expensive_search
        perUser:
          maxTokens: 10
          refillPeriod: 1m0s
      - name: shared_resource
        shared:
          maxTokens: 50
          refillPeriod: 1m0s

In this example:

Each user can make 100 total tool calls per minute.
Each user can make at most 10 expensive_search calls per minute (and those also count toward the 100 server-level limit).
All users combined can make 50 shared_resource calls per minute.

Rate limit a VirtualMCPServer

VirtualMCPServer resources accept the same rate limit shape under spec.config.rateLimiting. The fields and token bucket semantics match the MCPServer examples above, but the prerequisites are stricter:

spec.sessionStorage.provider must be redis. The CRD rejects any rateLimiting configuration without Redis-backed session storage.
spec.incomingAuth.type must be oidc when you configure any per-user bucket - either at the server level or on a per-tool override.

A request must pass both the server-level vMCP limit and the per-tool limit (if defined). Limits apply to the vMCP aggregator and are independent from any limits configured on the backend MCPServers it routes to.

vmcp-ratelimit.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: shared-toolkit
  namespace: toolhive-system
spec:
  groupRef:
    name: my-backends
  incomingAuth:
    type: oidc
    oidcConfigRef:
      name: my-oidc-config
      audience: shared-toolkit
  sessionStorage:
    provider: redis
    address: <YOUR_REDIS_ADDRESS>
  config:
    rateLimiting:
      shared:
        maxTokens: 5000
        refillPeriod: 1m0s
      perUser:
        maxTokens: 200
        refillPeriod: 1m0s
      tools:
        - name: expensive_search
          perUser:
            maxTokens: 20
            refillPeriod: 1m0s

Next steps

Token exchange to configure token exchange for upstream service authentication
CRD reference for complete field definitions

How rate limiting works​

Configure shared rate limits​

Configure per-user rate limits​

Combine shared and per-user limits​

Add per-tool overrides​

Rate limit a VirtualMCPServer​

Next steps​