ZAHLEN

Chapter 5

Rate Limits & Quotas


image

Request limits, quota exhaustion, and HTTP 429 handling


Audience

Merchants | Developers | Integration Engineers


Zahlen API User Guide v1.0

Source baseline: zahlen_deploy_0616A.tar.gz | June 2026

Chapter 5 - Rate Limits & Quotas


Chapter purpose

This chapter explains how Zahlen protects runtime capacity and contracted usage, how request-size limits differ from rate limits and quotas, and how a client should respond safely when the platform

returns HTTP 429.


Learning objectives

By the end of this chapter, you should be able to distinguish schema limits, short-window rate limits,

and longer-window quotas; identify quota-exhaustion behavior; and implement bounded, idempotent retries without creating a retry storm.

Zahlen applies controls at the authenticated tenant boundary. Rate limits protect the service during short periods of high request pressure. Quotas protect usage over longer policy or billing periods.

Request-schema limits define the largest valid payload accepted by a specific endpoint. These controls solve different problems and must not be treated as interchangeable.


Tenant-scoped enforcement

A request is evaluated in the tenant context resolved from X-API-Key. A client must not send tenant_id

to obtain a different limit or quota. Capacity and usage policy follow the authenticated tenant and its assigned plan.


    1. Three different kinds of limits

      Control

      Time horizon

      What it protects

      Typical client action

      Request or schema limit

      One request

      Payload validity and

      processing safety

      Reduce or split the

      payload before sending.


      Rate limit


      Seconds or minutes

      Short-window service capacity

      Pause and retry later with

      bounded backoff and jitter.


      Quota


      Hours, days, month, or contract period


      Tenant usage allowance

      Stop uncontrolled retrying, preserve work, and contact the plan owner if capacity is

      exhausted.


      Authorization or capability restriction


      Until policy changes


      Endpoint and feature access

      Do not retry automatically; verify plan, role, and contract

      enablement.


      Do not confuse 422 with 429

      A payload that exceeds a schema limit is a client-validation problem and may return a validation

      response such as HTTP 422. HTTP 429 means an otherwise valid request was refused because an active rate or quota control was reached.

    2. Request limits

      Request limits are part of the endpoint contract. They describe the valid size or shape of one API call. A larger usage plan does not automatically expand these limits, because the server model still validates each request against the published schema.


      Endpoint or resource

      Confirmed request limit

      What the client should do


      POST /v1/payment-events

      events must contain 1 to 10,000 items

      Use one event or split larger datasets into multiple controlled

      requests.


      POST /v1/payment-events/batch

      events must contain 1 to 10,000 items

      Choose a batch size that balances throughput, latency, memory, and

      replay safety.


      POST /v1/retry-decision/batch


      up to 500 legacy decision events

      Split larger legacy decision workloads into groups of 500 or

      fewer.

      GET batch resources

      limit is 1 through 1,000 when

      supplied; offset must be 0 or greater

      Paginate until has_more is false or

      all expected records are returned.

      Webhook subscription create

      1 to 20 event types; callback URL

      length 8 to 2,048

      Validate locally before sending the

      subscription request.


      Choosing a practical payment-event batch size

      The maximum accepted batch size is not a recommended default. A smaller batch is often easier to retry, observe, and reconcile. Choose a batch size by testing representative payloads and measuring request duration, response size, failure recovery, and downstream processing time.


      Batch characteristic

      Smaller batches

      Larger batches

      Failure scope

      Fewer events affected by one

      request failure

      More events require replay or

      reconciliation

      Network overhead

      More HTTP requests

      Lower per-event HTTP overhead

      Latency

      Events begin processing sooner

      Client may wait longer to assemble

      and transmit

      Correlation

      More batch IDs to track

      Fewer batch IDs, but larger failure

      domain

      Memory and serialization

      Lower client and server memory

      pressure

      Higher memory and serialization

      cost


      MAX_EVENTS_PER_REQUEST = 10_000


      def chunks(items, size=1000):

      for start in range(0, len(items), size): yield items[start:start + size]


      for event_batch in chunks(events, size=1000): submit_payment_events(event_batch)


Preserve identifiers while splitting

Each payment event should retain a stable, merchant-generated event_id. Splitting one source dataset

into multiple API batches must not change the identity of the underlying events.


    1. Rate limits

      A rate limit controls how much traffic a tenant may send during a short window. The exact window and numeric allowance are deployment- and plan-specific. Clients should therefore rely on explicit responses and current portal or administrative information rather than hard-code assumed values.

      Common causes of rate-limit pressure

      • A billing-cycle burst sends many requests at the same second.

      • Multiple application instances use the same tenant credentials without shared throttling.

      • A retry loop immediately repeats requests after every failure.

      • A worker backlog is released all at once after an outage.

      • Health checks or polling run more frequently than required.

      • A compromised key or programming error creates unexpected traffic.

        Client-side traffic shaping

        Technique

        Purpose

        Implementation note

        Concurrency limit

        Caps the number of requests in

        flight

        Use a semaphore or worker pool per

        environment and tenant.

        Token bucket or leaky bucket

        Smooths bursts over time

        Share state across instances when

        they use the same tenant quota.

        Queue with bounded workers

        Prevents sudden release of a large

        backlog

        Prioritize time-sensitive decision

        and outcome work appropriately.

        Adaptive batch size

        Reduces request count when safe

        Do not increase batch size beyond

        endpoint schema limits.

        Circuit breaker

        Stops repeated calls during

        sustained failure

        Open on repeated 429 or 5xx

        conditions and probe cautiously.


        Do not retry the payment processor schedule early

        Zahlen’s fixed recovery schedule is Day 1, Day 2, Day 6, and Day 16. An API 429 is a communications-capacity response; it does not authorize an extra payment retry or a change to the canonical payment-

        attempt schedule.

    2. Quotas and quota exhaustion

      A quota limits tenant usage over a longer policy period. Depending on the deployment, usage may be measured by request count, event volume, decision volume, outcome volume, or another contracted unit. Exact quota values belong to the tenant’s assigned plan and contract.

      What quota exhaustion means

      When a quota is exhausted, additional eligible requests may be rejected until the quota resets or an administrator changes the allowance. The application should preserve unsubmitted work, avoid duplicate processing, and expose a clear operational alert. Repeatedly sending the same request cannot restore capacity and may increase pressure.


      1

      Detect

      Recognize HTTP 429 and read available response metadata.

      2

      Pause

      Stop immediate retries for the affected operation or tenant.

      3

      Preserve

      Keep events, decisions, or outcomes in a durable merchant-side queue.

      4

      Assess

      Check usage, plan assignment, traffic anomalies, and reset policy.

      5

      Resume Drain the queue gradually after

      capacity is available.


      Operational questions during quota exhaustion

      • Is the tenant near an expected billing-cycle peak, or is this traffic abnormal?

      • Is one service, API key, or endpoint responsible for most usage?

      • Are duplicate requests or retries consuming the allowance?

      • Are outcomes being delayed in a way that breaks the recovery learning loop?

      • When does the applicable quota reset?

      • Does the merchant need a plan change, a temporary adjustment, or a client-side correction?


        Do not silently discard outcomes

        Retry outcomes close the learning loop. If outcome submissions are temporarily throttled, store them

        durably and send them later with their original identifiers and actual outcome timestamps.

    3. Understanding HTTP 429

      HTTP 429 Too Many Requests indicates that the platform is enforcing a rate or quota policy. The response is not a signal to immediately repeat the call. It is a signal to reduce pressure, wait, and retry only when the operation is safe to repeat.


      Response element

      How to use it

      HTTP status 429

      Classify the failure as throttling or quota enforcement,

      not validation or authentication.

      Retry-After header, when present

      Wait at least the specified interval before retrying.

      Request or correlation ID, when present

      Include it in logs and support escalation.


      Error code or metadata

      Distinguish short-window throttling from longer-window quota exhaustion when the deployment

      provides that detail.

      Idempotency state

      Reuse the same Idempotency-Key for the same logical

      POST operation.


      Recommended 429 response sequence

      1

      Stop

      Do not immediately repeat the failed request.

      2

      Read

      Inspect Retry-After and structured error metadata.

      3

      Back off

      Use bounded exponential backoff with randomized jitter.

      4

      Reuse

      Keep the original idempotency key for the same operation.

      5

      Alert Escalate sustained throttling or quota

      exhaustion.


      delay = min(max_delay, base_delay * (2 ** retry_number)) delay = delay * random.uniform(0.75, 1.25)


      if retry_after_header:

      delay = max(delay, parse_retry_after(retry_after_header))


Bound every retry loop

Set a maximum number of attempts, a maximum total elapsed time, and a dead-letter or operator-

review path. An unbounded retry loop can turn a temporary 429 into a sustained outage.

    1. Implementation examples

      Python example

      import random import time import requests


      def post_with_backoff(url, headers, payload, attempts=5): for retry_number in range(attempts):

      response = requests.post(url, headers=headers, json=payload, timeout=20) if response.status_code != 429:

      response.raise_for_status() return response.json()


      retry_after = response.headers.get('Retry-After') if retry_after and retry_after.isdigit():

      delay = float(retry_after) else:

      delay = min(60.0, 1.0 * (2 ** retry_number)) delay *= random.uniform(0.75, 1.25)

      time.sleep(delay)


      raise RuntimeError('Zahlen request remained throttled')


JavaScript example

async function zahlenFetch(url, options, maxAttempts = 5) { for (let attempt = 0; attempt < maxAttempts; attempt += 1) { const response = await fetch(url, options);

if (response.status !== 429) {

if (!response.ok) throw new Error(`Zahlen HTTP ${response.status}`); return response.json();

}


const retryAfter = Number(response.headers.get('Retry-After')); const baseMs = Number.isFinite(retryAfter)

? retryAfter * 1000

: Math.min(60000, 1000 * (2 ** attempt));

const jitteredMs = baseMs * (0.75 + Math.random() * 0.5);

await new Promise(resolve => setTimeout(resolve, jitteredMs));

}

throw new Error('Zahlen request remained throttled');

}


Curl diagnostic example

curl -i -X POST "$ZAHLEN_BASE_URL/v1/_next/retry-decision" \

-H "Content-Type: application/json" \

-H "X-API-Key: $ZAHLEN_API_KEY" \

-H "Idempotency-Key: order-8842-attempt-2" \

-d '{"attempt_number":2,"decline_code":"51"}'


Curl is diagnostic, not a retry engine

Use curl to inspect headers and response bodies. Production retry behavior should be implemented in

application code with durable state, bounded attempts, telemetry, and idempotency.


    1. Safe retries by operation

      Operation

      Retry after 429?

      Required safeguard

      GET event, batch, or decision

      resource

      Yes

      Use bounded backoff; GET is

      normally safe to repeat.

      POST retry decision

      Yes, carefully

      Reuse the same Idempotency-Key

      and identical logical request.


      POST retry outcome


      Yes, carefully

      Preserve decision_id, request_id, attempt number, outcome

      timestamp, and idempotency.


      POST payment-event ingestion


      Only with explicit replay safeguards

      Use stable event_id values and

      understand ingestion replay behavior.


      Create webhook subscription


      Only after verifying prior result

      Avoid creating duplicate subscriptions after an ambiguous

      timeout.

      422 validation response

      No

      Correct the request before

      resubmitting.


      401 or 403


      No automatic retry

      Correct authentication, authorization, plan, or capability

      policy first.


      Idempotency and payload identity

      An idempotency key represents one logical operation. A client must not reuse the same key for a materially different request. If a retry changes the payload, identifiers, or intended operation, the server may correctly treat it as a conflict rather than a replay.


      Keep payment retries separate from API retries

      Retrying an HTTP request is not the same as retrying a card authorization. HTTP retries preserve communication reliability. Payment retries must follow the Zahlen decision and the fixed Day 1, Day 2,

      Day 6, and Day 16 schedule.

    2. Monitoring and alerting

      Metric

      Why it matters

      Suggested alert condition

      429 count and rate

      Shows active throttling or quota

      pressure

      Unexpected increase or sustained

      nonzero rate.

      Retry-After duration

      Shows how long capacity pressure

      persists

      Increasing or unusually long delays.

      Quota utilization

      Provides advance warning before

      exhaustion

      Configured percentage threshold for

      the tenant plan.

      Queued unsubmitted events

      Measures work preserved during

      throttling

      Backlog grows faster than it drains.

      Outcome-reporting lag

      Detects a broken or delayed

      learning loop

      Outcomes exceed the merchant’s

      acceptable reporting delay.

      Duplicate or replay count

      Reveals client retry behavior

      Unexpected rise in idempotent

      replays or conflicts.

      Traffic by key and endpoint

      Helps identify loops or

      compromised credentials

      One key or endpoint deviates

      materially from baseline.


    3. Test cases before production

      • Validate that a payment-event request with 10,001 events is rejected locally before transmission.

      • Validate legacy retry-decision batches are split at 500 events or fewer.

      • Simulate HTTP 429 with and without Retry-After.

      • Confirm exponential backoff includes jitter and has a maximum delay.

      • Confirm the retry loop stops after the configured attempt or elapsed-time limit.

      • Confirm POST retries preserve the same idempotency key and payload.

      • Confirm throttled outcomes remain in durable storage and retain their actual timestamps.

      • Confirm a 401, 403, or 422 is not automatically retried as if it were a 429.

      • Confirm traffic resumes gradually after a quota reset or administrative change.

      • Confirm no HTTP retry creates an extra card authorization outside Day 1, Day 2, Day 6, and Day 16.

        Production readiness rule

        A client is not production-ready until it can survive throttling without losing events, duplicating logical

        operations, creating retry storms, or changing the canonical payment-attempt schedule.

    4. Chapter summary

      • Request limits define the largest valid payload for one endpoint call.

      • Payment-event ingestion accepts 1 to 10,000 events per request.

      • Legacy batch retry decision accepts no more than 500 events.

      • Batch-read pagination accepts limit values from 1 to 1,000 and offset values of 0 or greater.

      • Rate limits protect short-window runtime capacity; quotas protect longer-window tenant usage.

      • HTTP 429 requires pause, inspection, bounded backoff, jitter, idempotency, and monitoring.

      • Quota exhaustion should preserve work in a durable queue rather than discard it.

      • Exact numeric plan limits are deployment- and contract-specific.

      • HTTP retries never authorize payment attempts outside Zahlen’s fixed Day 1, Day 2, Day 6, and Day 16 schedule.

Developer checklist

Check

Ready

Client validates endpoint request-size limits before

sending

[ ]

429 handling reads Retry-After when present

[ ]

Backoff uses jitter, maximum delay, and maximum

attempts

[ ]

POST retries reuse stable idempotency keys

[ ]

Unsubmitted events and outcomes are stored durably

[ ]

Quota and 429 metrics are monitored by tenant, key,

and endpoint

[ ]

401, 403, 422, 429, and 5xx responses have distinct

handling

[ ]

API retries cannot create extra payment attempts