API Rate Limiting and Error Handling

Our APIs are designed to provide reliable and scalable access to your data. To ensure consistent performance for all customers, we apply rate limiting and recommend a standardized approach to handling API errors.

This document explains why rate limiting exists, how it works, and how clients should respond to API errors such as 429 Too Many Requests or 500 Internal Server Error.

1. Background

We currently offer two types of API deployments:

  1. Cloud API - Enforces rate limiting using a token bucket mechanism.
  2. On-Premises API (Edge) - Runs on the Edge in the customer's network and does not enforce rate limits.

Even though rate limiting is only applied to the Cloud API, we strongly recommend implementing proper retry and backoff logic for both environments. This ensures consistent client behavior across deployments.

2. Why Rate Limiting Exists

Rate limiting protects the system from overload and ensures fair resource usage among all tenants.

Without limits, a single client could:

  • Overwhelm shared resources
  • Cause degraded performance for other users
  • Trigger cascading failures

By enforcing a maximum request rate, we ensure system stability and predictable performance for everyone.

3. How Token Bucket Rate Limiting Works

The Cloud API uses a token bucket algorithm - a well-established approach for smoothing traffic bursts.

Here's how it works:

  • Each client (identified by their API token) has a "bucket" that fills with tokens at a fixed rate
  • Each request consumes one or more tokens
  • When the bucket is empty, additional requests are rejected with a 429 Too Many Requests response
  • The bucket refills over time, allowing new requests once tokens become available

This mechanism allows short bursts of activity while enforcing a maximum sustained rate over time.

4. Handling 429 Too Many Requests

When you receive a 429 response, it means your client is sending requests faster than the allowed rate.

Recommended Strategy

  1. Stop sending requests immediately upon receiving a 429 response
  2. Retry using exponential backoff:
    • Start with a short delay (e.g., 1 second)
    • Double the wait time after each retry (2s, 4s, 8s, 16s, …)
    • Cap the delay at a reasonable maximum (e.g., 30 seconds)
  3. Limit retries to avoid infinite loops (e.g., maximum 10 retries)

5. Handling Other Errors

In addition to 429, the API may respond with standard HTTP error codes.

CodeMeaningRecommended Action
400Bad RequestCheck request syntax and parameters. Do not retry automatically.
401UnauthorizedVerify credentials or token validity.
403ForbiddenEnsure you have the correct permissions.
404Not FoundVerify resource existence or endpoint URL.
500Internal Server ErrorRetry with exponential backoff. Report to support if persistent.
502 / 503 / 504Temporary upstream issuesRetry with exponential backoff.

For any 5xx errors, retries should be delayed and limited in the same way as for 429 responses.

6. Best Practices

  • Implement a shared retry policy across all API calls
  • Use idempotent requests (safe to retry without side effects)
  • Log error responses for debugging and support. Provide the requestId from the response when contacting support
  • Avoid tight retry loops—they can worsen server load

A well-behaved client improves not only its own reliability but also the overall system health.