ZAHLEN

Chapter 5

Rate Limits & Quotas

Request limits, quota exhaustion, and HTTP 429 handling

Audience

Merchants | Developers | Integration Engineers

Zahlen API User Guide v1.0

Source baseline: zahlen_deploy_0616A.tar.gz | June 2026

‌Chapter 5 - Rate Limits & Quotas

Chapter purpose

This chapter explains how Zahlen protects runtime capacity and contracted usage, how request-size limits differ from rate limits and quotas, and how a client should respond safely when the platform

returns HTTP 429.

Learning objectives

By the end of this chapter, you should be able to distinguish schema limits, short-window rate limits,

and longer-window quotas; identify quota-exhaustion behavior; and implement bounded, idempotent retries without creating a retry storm.

Zahlen applies controls at the authenticated tenant boundary. Rate limits protect the service during short periods of high request pressure. Quotas protect usage over longer policy or billing periods.

Request-schema limits define the largest valid payload accepted by a specific endpoint. These controls solve different problems and must not be treated as interchangeable.

Tenant-scoped enforcement

A request is evaluated in the tenant context resolved from X-API-Key. A client must not send tenant_id

to obtain a different limit or quota. Capacity and usage policy follow the authenticated tenant and its assigned plan.

‌Three different kinds of limits

Control

Time horizon

What it protects

Typical client action

Request or schema limit

One request

Payload validity and

processing safety

Reduce or split the

payload before sending.

Rate limit

Seconds or minutes

Short-window service capacity

Pause and retry later with

bounded backoff and jitter.

Quota

Hours, days, month, or contract period

Tenant usage allowance

Stop uncontrolled retrying, preserve work, and contact the plan owner if capacity is

exhausted.

Authorization or capability restriction

Until policy changes

Endpoint and feature access

Do not retry automatically; verify plan, role, and contract

enablement.

Do not confuse 422 with 429

A payload that exceeds a schema limit is a client-validation problem and may return a validation

response such as HTTP 422. HTTP 429 means an otherwise valid request was refused because an active rate or quota control was reached.

‌Request limits

Request limits are part of the endpoint contract. They describe the valid size or shape of one API call. A larger usage plan does not automatically expand these limits, because the server model still validates each request against the published schema.

Endpoint or resource

Confirmed request limit

What the client should do

POST /v1/payment-events

events must contain 1 to 10,000 items

Use one event or split larger datasets into multiple controlled

requests.

POST /v1/payment-events/batch

events must contain 1 to 10,000 items

Choose a batch size that balances throughput, latency, memory, and

replay safety.

POST /v1/retry-decision/batch

up to 500 legacy decision events

Split larger legacy decision workloads into groups of 500 or

fewer.

GET batch resources

limit is 1 through 1,000 when

supplied; offset must be 0 or greater

Paginate until has_more is false or

all expected records are returned.

Webhook subscription create

1 to 20 event types; callback URL

length 8 to 2,048

Validate locally before sending the

subscription request.

‌Choosing a practical payment-event batch size

The maximum accepted batch size is not a recommended default. A smaller batch is often easier to retry, observe, and reconcile. Choose a batch size by testing representative payloads and measuring request duration, response size, failure recovery, and downstream processing time.

Batch characteristic	Smaller batches	Larger batches
Failure scope	Fewer events affected by one request failure	More events require replay or reconciliation
Network overhead	More HTTP requests	Lower per-event HTTP overhead
Latency	Events begin processing sooner	Client may wait longer to assemble and transmit
Correlation	More batch IDs to track	Fewer batch IDs, but larger failure domain
Memory and serialization	Lower client and server memory pressure	Higher memory and serialization cost

MAX_EVENTS_PER_REQUEST = 10_000

def chunks(items, size=1000):

for start in range(0, len(items), size): yield items[start:start + size]

for event_batch in chunks(events, size=1000): submit_payment_events(event_batch)

Preserve identifiers while splitting

Each payment event should retain a stable, merchant-generated event_id. Splitting one source dataset

into multiple API batches must not change the identity of the underlying events.

‌Rate limits

A rate limit controls how much traffic a tenant may send during a short window. The exact window and numeric allowance are deployment- and plan-specific. Clients should therefore rely on explicit responses and current portal or administrative information rather than hard-code assumed values.

‌Common causes of rate-limit pressure

A billing-cycle burst sends many requests at the same second.
Multiple application instances use the same tenant credentials without shared throttling.
A retry loop immediately repeats requests after every failure.
A worker backlog is released all at once after an outage.
Health checks or polling run more frequently than required.

A compromised key or programming error creates unexpected traffic.

‌Client-side traffic shaping

Technique	Purpose	Implementation note
Concurrency limit	Caps the number of requests in flight	Use a semaphore or worker pool per environment and tenant.
Token bucket or leaky bucket	Smooths bursts over time	Share state across instances when they use the same tenant quota.
Queue with bounded workers	Prevents sudden release of a large backlog	Prioritize time-sensitive decision and outcome work appropriately.
Adaptive batch size	Reduces request count when safe	Do not increase batch size beyond endpoint schema limits.
Circuit breaker	Stops repeated calls during sustained failure	Open on repeated 429 or 5xx conditions and probe cautiously.

Do not retry the payment processor schedule early

Zahlen’s fixed recovery schedule is Day 1, Day 2, Day 6, and Day 16. An API 429 is a communications-capacity response; it does not authorize an extra payment retry or a change to the canonical payment-

attempt schedule.

‌Quotas and quota exhaustion

A quota limits tenant usage over a longer policy period. Depending on the deployment, usage may be measured by request count, event volume, decision volume, outcome volume, or another contracted unit. Exact quota values belong to the tenant’s assigned plan and contract.

‌What quota exhaustion means

When a quota is exhausted, additional eligible requests may be rejected until the quota resets or an administrator changes the allowance. The application should preserve unsubmitted work, avoid duplicate processing, and expose a clear operational alert. Repeatedly sending the same request cannot restore capacity and may increase pressure.

Detect

Recognize HTTP 429 and read available response metadata.

Pause

Stop immediate retries for the affected operation or tenant.

Preserve

Keep events, decisions, or outcomes in a durable merchant-side queue.

Assess

Check usage, plan assignment, traffic anomalies, and reset policy.

Resume Drain the queue gradually after

capacity is available.

‌Operational questions during quota exhaustion

Is the tenant near an expected billing-cycle peak, or is this traffic abnormal?
Is one service, API key, or endpoint responsible for most usage?
Are duplicate requests or retries consuming the allowance?
Are outcomes being delayed in a way that breaks the recovery learning loop?
When does the applicable quota reset?

Does the merchant need a plan change, a temporary adjustment, or a client-side correction?

Do not silently discard outcomes

Retry outcomes close the learning loop. If outcome submissions are temporarily throttled, store them

durably and send them later with their original identifiers and actual outcome timestamps.

‌Understanding HTTP 429

HTTP 429 Too Many Requests indicates that the platform is enforcing a rate or quota policy. The response is not a signal to immediately repeat the call. It is a signal to reduce pressure, wait, and retry only when the operation is safe to repeat.

Response element	How to use it
HTTP status 429	Classify the failure as throttling or quota enforcement, not validation or authentication.
Retry-After header, when present	Wait at least the specified interval before retrying.
Request or correlation ID, when present	Include it in logs and support escalation.
Error code or metadata	Distinguish short-window throttling from longer-window quota exhaustion when the deployment provides that detail.
Idempotency state	Reuse the same Idempotency-Key for the same logical POST operation.

‌Recommended 429 response sequence

Stop

Do not immediately repeat the failed request.

Read

Inspect Retry-After and structured error metadata.

Back off

Use bounded exponential backoff with randomized jitter.

Reuse

Keep the original idempotency key for the same operation.

Alert Escalate sustained throttling or quota

exhaustion.

delay = min(max_delay, base_delay * (2 ** retry_number)) delay = delay * random.uniform(0.75, 1.25)

if retry_after_header:

delay = max(delay, parse_retry_after(retry_after_header))

Bound every retry loop

Set a maximum number of attempts, a maximum total elapsed time, and a dead-letter or operator-

review path. An unbounded retry loop can turn a temporary 429 into a sustained outage.

‌Implementation examples
‌Python example
import random import time import requests

def post_with_backoff(url, headers, payload, attempts=5): for retry_number in range(attempts):
response = requests.post(url, headers=headers, json=payload, timeout=20) if response.status_code != 429:
response.raise_for_status() return response.json()

retry_after = response.headers.get('Retry-After') if retry_after and retry_after.isdigit():
delay = float(retry_after) else:
delay = min(60.0, 1.0 * (2 ** retry_number)) delay *= random.uniform(0.75, 1.25)
time.sleep(delay)

raise RuntimeError('Zahlen request remained throttled')

‌JavaScript example

async function zahlenFetch(url, options, maxAttempts = 5) { for (let attempt = 0; attempt < maxAttempts; attempt += 1) { const response = await fetch(url, options);

if (response.status !== 429) {

if (!response.ok) throw new Error(`Zahlen HTTP ${response.status}`); return response.json();

}

const retryAfter = Number(response.headers.get('Retry-After')); const baseMs = Number.isFinite(retryAfter)

? retryAfter * 1000

: Math.min(60000, 1000 * (2 ** attempt));

const jitteredMs = baseMs * (0.75 + Math.random() * 0.5);

await new Promise(resolve => setTimeout(resolve, jitteredMs));

}

throw new Error('Zahlen request remained throttled');

}

‌Curl diagnostic example

curl -i -X POST "$ZAHLEN_BASE_URL/v1/_next/retry-decision" \

-H "Content-Type: application/json" \

-H "X-API-Key: $ZAHLEN_API_KEY" \

-H "Idempotency-Key: order-8842-attempt-2" \

-d '{"attempt_number":2,"decline_code":"51"}'

Curl is diagnostic, not a retry engine

Use curl to inspect headers and response bodies. Production retry behavior should be implemented in

application code with durable state, bounded attempts, telemetry, and idempotency.

‌Safe retries by operation

Operation

Retry after 429?

Required safeguard

GET event, batch, or decision

resource

Yes

Use bounded backoff; GET is

normally safe to repeat.

POST retry decision

Yes, carefully

Reuse the same Idempotency-Key

and identical logical request.

POST retry outcome

Yes, carefully

Preserve decision_id, request_id, attempt number, outcome

timestamp, and idempotency.

POST payment-event ingestion

Only with explicit replay safeguards

Use stable event_id values and

understand ingestion replay behavior.

Create webhook subscription

Only after verifying prior result

Avoid creating duplicate subscriptions after an ambiguous

timeout.

422 validation response

Correct the request before

resubmitting.

401 or 403

No automatic retry

Correct authentication, authorization, plan, or capability

policy first.

‌Idempotency and payload identity

An idempotency key represents one logical operation. A client must not reuse the same key for a materially different request. If a retry changes the payload, identifiers, or intended operation, the server may correctly treat it as a conflict rather than a replay.

Keep payment retries separate from API retries

Retrying an HTTP request is not the same as retrying a card authorization. HTTP retries preserve communication reliability. Payment retries must follow the Zahlen decision and the fixed Day 1, Day 2,

Day 6, and Day 16 schedule.

‌Monitoring and alerting

Metric	Why it matters	Suggested alert condition
429 count and rate	Shows active throttling or quota pressure	Unexpected increase or sustained nonzero rate.
Retry-After duration	Shows how long capacity pressure persists	Increasing or unusually long delays.
Quota utilization	Provides advance warning before exhaustion	Configured percentage threshold for the tenant plan.
Queued unsubmitted events	Measures work preserved during throttling	Backlog grows faster than it drains.
Outcome-reporting lag	Detects a broken or delayed learning loop	Outcomes exceed the merchant’s acceptable reporting delay.
Duplicate or replay count	Reveals client retry behavior	Unexpected rise in idempotent replays or conflicts.
Traffic by key and endpoint	Helps identify loops or compromised credentials	One key or endpoint deviates materially from baseline.

‌Test cases before production

Validate that a payment-event request with 10,001 events is rejected locally before transmission.
Validate legacy retry-decision batches are split at 500 events or fewer.
Simulate HTTP 429 with and without Retry-After.
Confirm exponential backoff includes jitter and has a maximum delay.
Confirm the retry loop stops after the configured attempt or elapsed-time limit.
Confirm POST retries preserve the same idempotency key and payload.
Confirm throttled outcomes remain in durable storage and retain their actual timestamps.
Confirm a 401, 403, or 422 is not automatically retried as if it were a 429.
Confirm traffic resumes gradually after a quota reset or administrative change.

Confirm no HTTP retry creates an extra card authorization outside Day 1, Day 2, Day 6, and Day 16.

Production readiness rule

A client is not production-ready until it can survive throttling without losing events, duplicating logical

operations, creating retry storms, or changing the canonical payment-attempt schedule.

‌Chapter summary
- Request limits define the largest valid payload for one endpoint call.
- Payment-event ingestion accepts 1 to 10,000 events per request.
- Legacy batch retry decision accepts no more than 500 events.
- Batch-read pagination accepts limit values from 1 to 1,000 and offset values of 0 or greater.
- Rate limits protect short-window runtime capacity; quotas protect longer-window tenant usage.
- HTTP 429 requires pause, inspection, bounded backoff, jitter, idempotency, and monitoring.
- Quota exhaustion should preserve work in a durable queue rather than discard it.
- Exact numeric plan limits are deployment- and contract-specific.
- HTTP retries never authorize payment attempts outside Zahlen’s fixed Day 1, Day 2, Day 6, and Day 16 schedule.

‌Developer checklist

Check	Ready
Client validates endpoint request-size limits before sending	[ ]
429 handling reads Retry-After when present	[ ]
Backoff uses jitter, maximum delay, and maximum attempts	[ ]
POST retries reuse stable idempotency keys	[ ]
Unsubmitted events and outcomes are stored durably	[ ]
Quota and 429 metrics are monitored by tenant, key, and endpoint	[ ]
401, 403, 422, 429, and 5xx responses have distinct handling	[ ]
API retries cannot create extra payment attempts