Chapter 5
![]()
Request limits, quota exhaustion, and HTTP 429 handling
Audience
Merchants | Developers | Integration Engineers
Zahlen API User Guide v1.0
Source baseline: zahlen_deploy_0616A.tar.gz | June 2026
Chapter purpose |
This chapter explains how Zahlen protects runtime capacity and contracted usage, how request-size limits differ from rate limits and quotas, and how a client should respond safely when the platform returns HTTP 429. |
Learning objectives |
By the end of this chapter, you should be able to distinguish schema limits, short-window rate limits, and longer-window quotas; identify quota-exhaustion behavior; and implement bounded, idempotent retries without creating a retry storm. |
Zahlen applies controls at the authenticated tenant boundary. Rate limits protect the service during short periods of high request pressure. Quotas protect usage over longer policy or billing periods.
Request-schema limits define the largest valid payload accepted by a specific endpoint. These controls solve different problems and must not be treated as interchangeable.
Tenant-scoped enforcement |
A request is evaluated in the tenant context resolved from X-API-Key. A client must not send tenant_id to obtain a different limit or quota. Capacity and usage policy follow the authenticated tenant and its assigned plan. |
Control | Time horizon | What it protects | Typical client action |
Request or schema limit | One request | Payload validity and processing safety | Reduce or split the payload before sending. |
Rate limit | Seconds or minutes | Short-window service capacity | Pause and retry later with bounded backoff and jitter. |
Quota | Hours, days, month, or contract period | Tenant usage allowance | Stop uncontrolled retrying, preserve work, and contact the plan owner if capacity is exhausted. |
Authorization or capability restriction | Until policy changes | Endpoint and feature access | Do not retry automatically; verify plan, role, and contract enablement. |
Do not confuse 422 with 429 |
A payload that exceeds a schema limit is a client-validation problem and may return a validation response such as HTTP 422. HTTP 429 means an otherwise valid request was refused because an active rate or quota control was reached. |
Request limits are part of the endpoint contract. They describe the valid size or shape of one API call. A larger usage plan does not automatically expand these limits, because the server model still validates each request against the published schema.
Endpoint or resource | Confirmed request limit | What the client should do |
POST /v1/payment-events | events must contain 1 to 10,000 items | Use one event or split larger datasets into multiple controlled requests. |
POST /v1/payment-events/batch | events must contain 1 to 10,000 items | Choose a batch size that balances throughput, latency, memory, and replay safety. |
POST /v1/retry-decision/batch | up to 500 legacy decision events | Split larger legacy decision workloads into groups of 500 or fewer. |
GET batch resources | limit is 1 through 1,000 when supplied; offset must be 0 or greater | Paginate until has_more is false or all expected records are returned. |
Webhook subscription create | 1 to 20 event types; callback URL length 8 to 2,048 | Validate locally before sending the subscription request. |
Choosing a practical payment-event batch size
The maximum accepted batch size is not a recommended default. A smaller batch is often easier to retry, observe, and reconcile. Choose a batch size by testing representative payloads and measuring request duration, response size, failure recovery, and downstream processing time.
Batch characteristic | Smaller batches | Larger batches |
Failure scope | Fewer events affected by one request failure | More events require replay or reconciliation |
Network overhead | More HTTP requests | Lower per-event HTTP overhead |
Latency | Events begin processing sooner | Client may wait longer to assemble and transmit |
Correlation | More batch IDs to track | Fewer batch IDs, but larger failure domain |
Memory and serialization | Lower client and server memory pressure | Higher memory and serialization cost |
MAX_EVENTS_PER_REQUEST = 10_000
def chunks(items, size=1000):
for start in range(0, len(items), size): yield items[start:start + size]
for event_batch in chunks(events, size=1000): submit_payment_events(event_batch)
Preserve identifiers while splitting |
Each payment event should retain a stable, merchant-generated event_id. Splitting one source dataset into multiple API batches must not change the identity of the underlying events. |
A rate limit controls how much traffic a tenant may send during a short window. The exact window and numeric allowance are deployment- and plan-specific. Clients should therefore rely on explicit responses and current portal or administrative information rather than hard-code assumed values.
Common causes of rate-limit pressure
A billing-cycle burst sends many requests at the same second.
Multiple application instances use the same tenant credentials without shared throttling.
A retry loop immediately repeats requests after every failure.
A worker backlog is released all at once after an outage.
Health checks or polling run more frequently than required.
A compromised key or programming error creates unexpected traffic.
Client-side traffic shaping
Technique | Purpose | Implementation note |
Concurrency limit | Caps the number of requests in flight | Use a semaphore or worker pool per environment and tenant. |
Token bucket or leaky bucket | Smooths bursts over time | Share state across instances when they use the same tenant quota. |
Queue with bounded workers | Prevents sudden release of a large backlog | Prioritize time-sensitive decision and outcome work appropriately. |
Adaptive batch size | Reduces request count when safe | Do not increase batch size beyond endpoint schema limits. |
Circuit breaker | Stops repeated calls during sustained failure | Open on repeated 429 or 5xx conditions and probe cautiously. |
Do not retry the payment processor schedule early |
Zahlen’s fixed recovery schedule is Day 1, Day 2, Day 6, and Day 16. An API 429 is a communications-capacity response; it does not authorize an extra payment retry or a change to the canonical payment- attempt schedule. |
A quota limits tenant usage over a longer policy period. Depending on the deployment, usage may be measured by request count, event volume, decision volume, outcome volume, or another contracted unit. Exact quota values belong to the tenant’s assigned plan and contract.
What quota exhaustion means
When a quota is exhausted, additional eligible requests may be rejected until the quota resets or an administrator changes the allowance. The application should preserve unsubmitted work, avoid duplicate processing, and expose a clear operational alert. Repeatedly sending the same request cannot restore capacity and may increase pressure.
1 Detect Recognize HTTP 429 and read available response metadata. | 2 Pause Stop immediate retries for the affected operation or tenant. | 3 Preserve Keep events, decisions, or outcomes in a durable merchant-side queue. | 4 Assess Check usage, plan assignment, traffic anomalies, and reset policy. | 5 Resume Drain the queue gradually after capacity is available. |
Operational questions during quota exhaustion
Is the tenant near an expected billing-cycle peak, or is this traffic abnormal?
Is one service, API key, or endpoint responsible for most usage?
Are duplicate requests or retries consuming the allowance?
Are outcomes being delayed in a way that breaks the recovery learning loop?
When does the applicable quota reset?
Does the merchant need a plan change, a temporary adjustment, or a client-side correction?
Do not silently discard outcomes |
Retry outcomes close the learning loop. If outcome submissions are temporarily throttled, store them durably and send them later with their original identifiers and actual outcome timestamps. |
HTTP 429 Too Many Requests indicates that the platform is enforcing a rate or quota policy. The response is not a signal to immediately repeat the call. It is a signal to reduce pressure, wait, and retry only when the operation is safe to repeat.
Response element | How to use it |
HTTP status 429 | Classify the failure as throttling or quota enforcement, not validation or authentication. |
Retry-After header, when present | Wait at least the specified interval before retrying. |
Request or correlation ID, when present | Include it in logs and support escalation. |
Error code or metadata | Distinguish short-window throttling from longer-window quota exhaustion when the deployment provides that detail. |
Idempotency state | Reuse the same Idempotency-Key for the same logical POST operation. |
Recommended 429 response sequence
1 Stop Do not immediately repeat the failed request. | 2 Read Inspect Retry-After and structured error metadata. | 3 Back off Use bounded exponential backoff with randomized jitter. | 4 Reuse Keep the original idempotency key for the same operation. | 5 Alert Escalate sustained throttling or quota exhaustion. |
delay = min(max_delay, base_delay * (2 ** retry_number)) delay = delay * random.uniform(0.75, 1.25)
if retry_after_header:
delay = max(delay, parse_retry_after(retry_after_header))
Bound every retry loop |
Set a maximum number of attempts, a maximum total elapsed time, and a dead-letter or operator- review path. An unbounded retry loop can turn a temporary 429 into a sustained outage. |
Python example
import random import time import requests
def post_with_backoff(url, headers, payload, attempts=5): for retry_number in range(attempts):
response = requests.post(url, headers=headers, json=payload, timeout=20) if response.status_code != 429:
response.raise_for_status() return response.json()
retry_after = response.headers.get('Retry-After') if retry_after and retry_after.isdigit():
delay = float(retry_after) else:
delay = min(60.0, 1.0 * (2 ** retry_number)) delay *= random.uniform(0.75, 1.25)
time.sleep(delay)
raise RuntimeError('Zahlen request remained throttled')
JavaScript example
async function zahlenFetch(url, options, maxAttempts = 5) { for (let attempt = 0; attempt < maxAttempts; attempt += 1) { const response = await fetch(url, options);
if (response.status !== 429) {
if (!response.ok) throw new Error(`Zahlen HTTP ${response.status}`); return response.json();
}
const retryAfter = Number(response.headers.get('Retry-After')); const baseMs = Number.isFinite(retryAfter)
? retryAfter * 1000
: Math.min(60000, 1000 * (2 ** attempt));
const jitteredMs = baseMs * (0.75 + Math.random() * 0.5);
await new Promise(resolve => setTimeout(resolve, jitteredMs));
}
throw new Error('Zahlen request remained throttled');
}
Curl diagnostic example
curl -i -X POST "$ZAHLEN_BASE_URL/v1/_next/retry-decision" \
-H "Content-Type: application/json" \
-H "X-API-Key: $ZAHLEN_API_KEY" \
-H "Idempotency-Key: order-8842-attempt-2" \
-d '{"attempt_number":2,"decline_code":"51"}'
Curl is diagnostic, not a retry engine |
Use curl to inspect headers and response bodies. Production retry behavior should be implemented in application code with durable state, bounded attempts, telemetry, and idempotency. |
Operation | Retry after 429? | Required safeguard |
GET event, batch, or decision resource | Yes | Use bounded backoff; GET is normally safe to repeat. |
POST retry decision | Yes, carefully | Reuse the same Idempotency-Key and identical logical request. |
POST retry outcome | Yes, carefully | Preserve decision_id, request_id, attempt number, outcome timestamp, and idempotency. |
POST payment-event ingestion | Only with explicit replay safeguards | Use stable event_id values and understand ingestion replay behavior. |
Create webhook subscription | Only after verifying prior result | Avoid creating duplicate subscriptions after an ambiguous timeout. |
422 validation response | No | Correct the request before resubmitting. |
401 or 403 | No automatic retry | Correct authentication, authorization, plan, or capability policy first. |
Idempotency and payload identity
An idempotency key represents one logical operation. A client must not reuse the same key for a materially different request. If a retry changes the payload, identifiers, or intended operation, the server may correctly treat it as a conflict rather than a replay.
Keep payment retries separate from API retries |
Retrying an HTTP request is not the same as retrying a card authorization. HTTP retries preserve communication reliability. Payment retries must follow the Zahlen decision and the fixed Day 1, Day 2, Day 6, and Day 16 schedule. |
Metric | Why it matters | Suggested alert condition |
429 count and rate | Shows active throttling or quota pressure | Unexpected increase or sustained nonzero rate. |
Retry-After duration | Shows how long capacity pressure persists | Increasing or unusually long delays. |
Quota utilization | Provides advance warning before exhaustion | Configured percentage threshold for the tenant plan. |
Queued unsubmitted events | Measures work preserved during throttling | Backlog grows faster than it drains. |
Outcome-reporting lag | Detects a broken or delayed learning loop | Outcomes exceed the merchant’s acceptable reporting delay. |
Duplicate or replay count | Reveals client retry behavior | Unexpected rise in idempotent replays or conflicts. |
Traffic by key and endpoint | Helps identify loops or compromised credentials | One key or endpoint deviates materially from baseline. |
Validate that a payment-event request with 10,001 events is rejected locally before transmission.
Validate legacy retry-decision batches are split at 500 events or fewer.
Simulate HTTP 429 with and without Retry-After.
Confirm exponential backoff includes jitter and has a maximum delay.
Confirm the retry loop stops after the configured attempt or elapsed-time limit.
Confirm POST retries preserve the same idempotency key and payload.
Confirm throttled outcomes remain in durable storage and retain their actual timestamps.
Confirm a 401, 403, or 422 is not automatically retried as if it were a 429.
Confirm traffic resumes gradually after a quota reset or administrative change.
Confirm no HTTP retry creates an extra card authorization outside Day 1, Day 2, Day 6, and Day 16.
Production readiness rule |
A client is not production-ready until it can survive throttling without losing events, duplicating logical operations, creating retry storms, or changing the canonical payment-attempt schedule. |
Request limits define the largest valid payload for one endpoint call.
Payment-event ingestion accepts 1 to 10,000 events per request.
Legacy batch retry decision accepts no more than 500 events.
Batch-read pagination accepts limit values from 1 to 1,000 and offset values of 0 or greater.
Rate limits protect short-window runtime capacity; quotas protect longer-window tenant usage.
HTTP 429 requires pause, inspection, bounded backoff, jitter, idempotency, and monitoring.
Quota exhaustion should preserve work in a durable queue rather than discard it.
Exact numeric plan limits are deployment- and contract-specific.
HTTP retries never authorize payment attempts outside Zahlen’s fixed Day 1, Day 2, Day 6, and Day 16 schedule.
Developer checklist
Check | Ready |
Client validates endpoint request-size limits before sending | [ ] |
429 handling reads Retry-After when present | [ ] |
Backoff uses jitter, maximum delay, and maximum attempts | [ ] |
POST retries reuse stable idempotency keys | [ ] |
Unsubmitted events and outcomes are stored durably | [ ] |
Quota and 429 metrics are monitored by tenant, key, and endpoint | [ ] |
401, 403, 422, 429, and 5xx responses have distinct handling | [ ] |
API retries cannot create extra payment attempts |