API Rate Limiting and Error Handling

Our APIs are designed to provide reliable and scalable access to your data. To ensure consistent performance for all customers, we apply rate limiting and recommend a standardized approach to handling API errors.

This document explains why rate limiting exists, how it works, and how clients should respond to API errors such as 429 Too Many Requests or 500 Internal Server Error.

1. Background

We currently offer two types of API deployments:

Cloud API - Enforces rate limiting using a token bucket mechanism.
On-Premises API (Edge) - Runs on the Edge in the customer's network and does not enforce rate limits.

Even though rate limiting is only applied to the Cloud API, we strongly recommend implementing proper retry and backoff logic for both environments. This ensures consistent client behavior across deployments.

2. Why Rate Limiting Exists

Rate limiting protects the system from overload and ensures fair resource usage among all tenants.

Without limits, a single client could:

Overwhelm shared resources
Cause degraded performance for other users
Trigger cascading failures

By enforcing a maximum request rate, we ensure system stability and predictable performance for everyone.

3. How Token Bucket Rate Limiting Works

The Cloud API uses a token bucket algorithm - a well-established approach for smoothing traffic bursts.

Here's how it works:

Each client (identified by their API token) has a "bucket" that fills with tokens at a fixed rate
Each request consumes one or more tokens
When the bucket is empty, additional requests are rejected with a 429 Too Many Requests response
The bucket refills over time, allowing new requests once tokens become available

This mechanism allows short bursts of activity while enforcing a maximum sustained rate over time.

4. Handling `429 Too Many Requests`

When you receive a 429 response, it means your client is sending requests faster than the allowed rate.

Recommended Strategy

Stop sending requests immediately upon receiving a 429 response
Retry using exponential backoff:
- Start with a short delay (e.g., 1 second)
- Double the wait time after each retry (2s, 4s, 8s, 16s, …)
- Cap the delay at a reasonable maximum (e.g., 30 seconds)
Limit retries to avoid infinite loops (e.g., maximum 10 retries)

5. Handling Other Errors

In addition to 429, the API may respond with standard HTTP error codes.

Code	Meaning	Recommended Action
400	Bad Request	Check request syntax and parameters. Do not retry automatically.
401	Unauthorized	Verify credentials or token validity.
403	Forbidden	Ensure you have the correct permissions.
404	Not Found	Verify resource existence or endpoint URL.
500	Internal Server Error	Retry with exponential backoff. Report to support if persistent.
502 / 503 / 504	Temporary upstream issues	Retry with exponential backoff.

For any 5xx errors, retries should be delayed and limited in the same way as for 429 responses.

6. Best Practices

Implement a shared retry policy across all API calls
Use idempotent requests (safe to retry without side effects)
Log error responses for debugging and support. Provide the requestId from the response when contacting support
Avoid tight retry loops—they can worsen server load

A well-behaved client improves not only its own reliability but also the overall system health.