ZAHLEN

API User Guide

Chapter 11 - Error Handling

HTTP status codes | Validation errors | Authentication failures


Audience

Merchants, developers, and integration engineers who build resilient client applications against the Zahlen

API.


Version 1.0 | Source baseline: zahlen_deploy_0616A.tar.gz | June 2026 Commercial developer experience | Tenant-safe operations | Explainable retry intelligence

Chapter 11 - Error Handling


Learning objectives

By the end of this chapter, you should be able to distinguish transport, authentication, authorization, validation, throttling, conflict, and server errors; decide whether a request is safe to retry; and preserve the

identifiers needed for support and audit.

A dependable API client does not treat every non-success response the same way. A malformed JSON request needs a code change. A revoked API key needs credential remediation. A rate-limit response needs controlled backoff. A transient server failure may be safe to retry only when the operation has stable idempotency semantics.

The first rule is simple: classify the failure before deciding what to do next. Blind retries can create duplicate work, trigger retry storms, consume quotas, or obscure the original cause.


Payment schedule boundary

HTTP request retries are not payment retries. Client error recovery must never create authorization attempts

outside Zahlen's fixed Day 1, Day 2, Day 6, and Day 16 payment schedule.


    1. Error-handling goals

      • Give developers enough detail to correct a request without exposing secrets or internal implementation data.

      • Preserve request IDs, event IDs, batch IDs, decision IDs, outcome IDs, and upload job IDs for traceability.

      • Retry only transient failures and only with bounded backoff, jitter, and stable idempotency.

      • Fail closed when authentication or tenant ownership cannot be resolved.

      • Make errors observable through logs, metrics, alerts, and audit records.

    2. HTTP status codes

      The table below describes the status classes a Zahlen client should be prepared to handle. Exact response bodies may vary by route, but the client behavior should remain consistent.


      Status

      Meaning

      Typical cause

      Recommended client

      response

      200 / 201

      Request succeeded

      Resource read, created, or

      accepted.

      Parse the body and persist

      returned identifiers.


      400

      Malformed or business-invalid request

      Invalid JSON, incompatible values, or a business rule

      failure.

      Correct the request. Do not blindly retry.


      401

      Missing or invalid authentication

      Missing X-API-Key,

      malformed key, revoked key, or wrong environment.

      Stop the request flow and repair credentials.


      403

      Authenticated but not permitted

      Plan, role, capability, or

      endpoint policy denies access.

      Check authorization and contract settings.


      404

      Resource not visible or not found

      Wrong identifier, wrong tenant, deleted resource, or

      wrong environment.

      Verify tenant-scoped identifiers and hostname.


      409


      Conflict or idempotency mismatch

      Same idempotency key used with a different logical request or state conflict.

      Compare the original request and key; do not generate another payment

      attempt.


      422


      Schema validation failure

      Missing required field,

      wrong type, invalid bounds, or forbidden extra field.

      Read field-level errors and fix serialization.


      429


      Rate or quota enforcement

      Short-window rate limit or longer-window quota

      exhausted.

      Honor Retry-After when present; back off with jitter.


      500


      Unexpected server failure


      Unhandled server condition.

      Retry cautiously with idempotency; alert if

      repeated.


      503

      Service or dependency unavailable

      Maintenance, dependency outage, worker issue, or

      overload.

      Use bounded backoff; stop after a configured limit.


      Do not assume every 4xx is permanent

      Some 401, 403, or 404 responses result from using the wrong environment or tenant-scoped identifier.

      Diagnose the context before changing application logic.

    3. Reading an error response

      The deployed API includes an ApiErrorResponse model with top-level error and meta objects. Clients should preserve the entire safe response for diagnostics, while avoiding logs that expose credentials or prohibited payment data.

      {

      "error": {

      "code": "EXAMPLE_CODE",

      "message": "Human-readable explanation", "details": {"field": "example"}

      },

      "meta": {

      "request_id": "req_example", "time": "2026-06-16T15:00:00Z"

      }

      }


Illustrative structure

The sample above demonstrates a practical parsing pattern. Treat the actual response contract returned by

the route as the source of truth and tolerate additional documented metadata.


What to capture


Value

Why it matters

HTTP method and path

Identifies the failing operation without logging secrets.

Status code

Primary classification for client behavior.

Safe error code and message

Supports remediation and alert grouping.

Server request ID

Connects merchant logs to Zahlen audit and support

records.

Client correlation IDs

Links event, batch, decision, outcome, and job flows.

Attempt count and elapsed time

Shows retry behavior and retry-storm risk.

Environment and service version

Helps identify wrong-host and contract-version problems.


What not to log

    1. Validation errors

      Key Zahlen request models use strict validation and forbid unknown top-level properties. This prevents misspelled or unsupported fields from being silently ignored. A validation failure is a client defect or contract mismatch, not a transient network event.

      Common causes


      Cause

      Example

      Correction

      Missing required field

      Payment event without event_id.

      Add the required field before sending.

      Wrong data type

      attempt_number sent as an object.

      Serialize the documented integer type.

      Out-of-range value

      attempt_number below its minimum.

      Validate bounds in the client model.

      Empty collection

      Payment-events request with no

      events.

      Send at least one event.

      Oversized collection

      More than 10,000 payment events or

      more than 500 legacy batch decisions.

      Split into valid batches and respect

      quotas.

      Forbidden extra field

      Misspelled property such as eventid.

      Correct the field name; do not expect it

      to be ignored.

      Invalid URL or string length

      Webhook callback URL outside schema

      constraints.

      Validate before submission.


      Example: invalid payment event


      {

      "events": [{

      "eventid": "evt_0001", "attempt_number": 0

      }]

      }

This request has two problems: eventid is not the required event_id field, and attempt_number violates the payment-event minimum of 1. Because unknown fields are forbidden, the misspelled field is not silently accepted.

Client-side validation pattern

  1. Build request objects from strict typed models.

  2. Validate required fields, types, bounds, and collection sizes before network transmission.

  3. Serialize once and contract-test the exact JSON shape.

  4. On HTTP 422, map returned field errors to developer-visible diagnostics.

  5. Do not retry until the payload is corrected.

      1. Authentication failures

        Merchant-facing routes authenticate with the X-API-Key header. Zahlen derives merchant, tenant, and actor context from the key. A client must not attempt to replace missing authentication by supplying tenant_id in JSON, query strings, or forms.

        X-API-Key: zk_live_REPLACE_ME


        Diagnostic sequence for HTTP 401

  6. Confirm the X-API-Key header is present and spelled exactly.

  7. Confirm the secret was loaded from the intended secret manager or environment variable.

  8. Confirm the base URL belongs to the same environment as the key.

  9. Confirm the key is active and has not been revoked or expired.

  10. Check whether whitespace, quotes, line breaks, or proxy configuration altered the header.

  11. Use the key identifier or safe fingerprint to review activity without exposing the secret.

  12. Rotate immediately if compromise is suspected.


    Fail closed

    When a key cannot be resolved to a valid tenant context, the request must be denied. Never fall back to a

    default production tenant.


    401 versus 403


    Response

    Interpretation

    Example response action

    401 Unauthorized

    The caller is not successfully

    authenticated.

    Repair or rotate the credential.

    403 Forbidden

    The caller is authenticated but lacks

    permission for the route or capability.

    Check plan, role, endpoint

    authorization, and contract.


    Security response to suspected compromise

  13. Locate the original request body and idempotency key.

  14. Compare the current body byte-for-byte or field-for-field with the original logical operation.

  15. Read the existing resource if the API exposes it.

  16. Continue from the known durable result rather than creating a duplicate.

  17. Escalate if the original result cannot be reconciled safely.

      1. Safe retry strategy

        Automatic retry behavior should be explicit for each operation. Retry budgets must be bounded by maximum attempts, maximum elapsed time, and circuit-breaker rules.


        Operation / response

        Automatic retry?

        Required safeguards

        GET resource + transient 5xx/503

        Usually

        Exponential backoff, jitter, maximum

        attempts.

        POST retry decision + transient failure

        Yes, carefully

        Reuse the same Idempotency-Key and

        identical logical request.

        POST retry outcome + transient failure

        Yes, carefully

        Reuse stable identifiers and

        idempotency where supported.

        POST payment-event batch + uncertain result


        Only with safeguards

        Use stable event IDs; first attempt to read the resulting resource or

        reconcile ingestion.

        HTTP 400 or 422

        No

        Fix the request before resubmission.

        HTTP 401

        No

        Repair authentication or rotate the

        key.

        HTTP 403

        No

        Resolve authorization or plan

        restrictions.


        HTTP 404


        Usually no

        Verify identifier, ownership, environment, and asynchronous

        timing.

        HTTP 409

        No blind retry

        Reconcile the original idempotent

        operation.

        HTTP 429

        Yes, later

        Honor Retry-After; use bounded

        backoff and jitter.


        Exponential backoff with jitter


        delay = min(max_delay, base_delay * (2 ** retry_number))

        delay = delay * random.uniform(0.75, 1.25)

    1. Implementation examples

      Python error mapper


      import time import random import requests


      RETRYABLE = {429, 500, 503}


      def request_with_policy(method, url, *, headers, json=None, max_attempts=4): for attempt in range(max_attempts):

      response = requests.request(

      method, url, headers=headers, json=json, timeout=20

      )

      if response.ok:

      return response


      request_id = response.headers.get("X-Request-ID") if response.status_code not in RETRYABLE:

      raise RuntimeError(

      f"Zahlen HTTP {response.status_code}; request_id={request_id}; " f"body={response.text[:1000]}"

      )


      if attempt == max_attempts - 1: response.raise_for_status()


      retry_after = response.headers.get("Retry-After") if retry_after and retry_after.isdigit():

      delay = float(retry_after) else:

      delay = min(30.0, 0.5 * (2 ** attempt)) delay *= random.uniform(0.75, 1.25)

      time.sleep(delay)


      raise RuntimeError("unreachable")


Production note

Use structured logs and redact response content according to your data policy. The example truncates the

body but does not replace a proper allowlist-based logging design.

JavaScript status handling


const response = await fetch(url, options);

const requestId = response.headers.get("x-request-id");


if (response.ok) {

return await response.json();

}

const body = await response.text(); switch (response.status) {

case 401:

throw new Error(`Authentication failed; request_id=${requestId}`); case 403:

throw new Error(`Not permitted; request_id=${requestId}`); case 422:

throw new Error(`Validation failed; request_id=${requestId}; ${body}`); case 429:

throw new Error(`Throttled; retry-after=${response.headers.get("retry-after")}`); default:

throw new Error(`Zahlen HTTP ${response.status}; request_id=${requestId}`);

}

    1. Monitoring and alerting

      Error handling is incomplete until failures are observable. Aggregate by tenant, route, status class, error code, key identifier, and deployment version while protecting sensitive information.


      Signal

      What it may indicate

      Suggested response

      401 rate

      Revoked, expired, missing, or

      misconfigured keys.

      Check recent deployments and key

      activity.

      403 rate

      Plan or authorization mismatch.

      Review capability and endpoint policy.

      422 rate

      Client release or schema drift.

      Inspect field errors and contract tests.

      429 rate

      Capacity pressure, quota exhaustion,

      or request loop.

      Throttle clients and review usage.

      5xx / 503 rate

      Runtime or dependency degradation.

      Open incident and use circuit breakers.

      Idempotent replay rate

      Expected retry behavior or unstable

      client network.

      Verify that replays are returning the

      same durable result.

      Outcome-reporting lag

      Broken recovery learning loop.

      Check retry-outcome clients and

      queues.


      Alert design

      • Alert on sustained rates or error-budget impact, not every individual validation error.

      • Use separate alerts for authentication spikes, throttling, server failures, and outcome-reporting lag.

      • Include safe correlation identifiers and links to internal dashboards, never complete secrets.

      • Suppress duplicate alerts during a known incident while preserving metrics and logs.


        Support package

        When escalating to Zahlen support, provide environment, UTC time window, method, path, status, safe error

        code, request ID, and relevant durable identifiers. Do not provide the full API key or prohibited payment data.

    2. Troubleshooting playbooks

      Validation failures after a client deployment

      1. Compare the new serialized JSON with the previous known-good payload.

      2. Check renamed, missing, nullable, and forbidden fields.

      3. Validate collection sizes and numeric bounds.

      4. Run contract tests against the current discovery schema.

      5. Roll back or correct the client; do not add blind retry logic.

        Authentication failures across all routes

      6. Check base URL and environment.

      7. Check secret injection and header construction.

      8. Check key status, rotation timing, and revocation records.

      9. Verify the system clock and proxy/header forwarding behavior where relevant.

      10. Rotate if compromise or accidental disclosure is possible.

        Repeated 429 responses

      11. Stop immediate retries and honor Retry-After when provided.

      12. Measure traffic by service, route, and key identifier.

      13. Look for a retry loop or duplicated worker deployment.

      14. Check plan assignment, quota configuration, and current usage.

      15. Increase capacity or quota only after explaining the traffic pattern.

        Repeated 5xx or 503 responses

      16. Enable bounded backoff and a circuit breaker.

      17. Preserve idempotency keys and request correlation.

      18. Check Zahlen health and version endpoints when reachable.

      19. Stop before exceeding the retry budget.

      20. Escalate with a safe support package and UTC timestamps.


      Final rule

      A successful recovery client is conservative: it validates before sending, authenticates securely, retries only when safe, preserves durable identifiers, and never converts transport uncertainty into an extra payment

      attempt.


    3. Production readiness checklist

      • Every API call has a documented status-handling policy.

      • Validation is performed locally with strict typed request models.

      • Unknown fields fail tests before reaching production.

      • API keys are never logged and 401 handling stops automatic retries.

      • 409 handling reconciles the original idempotent operation.

      • 429 handling honors Retry-After and uses bounded jittered backoff.

      • POST retries reuse the same stable idempotency key where supported.