ZAHLEN

API USER GUIDE


Chapter 9

Investigation Runs API

Listing runs • Run status • Run detail retrieval


For merchants, developers, and integration engineers

Source baseline: zahlen_deploy_0616A.tar.gz | Version 1.0 | June 2026 Administrative visibility • Tenant-safe processing • Durable operational traceability


Commercial workflow context

Payment Event Retry Decision Retry Outcome Investigation Run

Reporting

Chapter 9 — Investigation Runs API


Learning objectives

By the end of this chapter, you should be able to list tenant-scoped investigation runs, read processing status, retrieve run details, and connect each run to the wider Zahlen evidence pipeline.

Investigation runs are durable administrative records that track how uploaded or API-ingested payment evidence moves through validation, normalization, analysis, and downstream population. They provide the operational bridge between a merchant submission and the reporting, Recovery Intelligence, issuer monitoring, and governance views that follow.

    1. Where investigation runs fit


      Stage

      Primary identifier

      What it tells you

      Payment-event ingestion

      payment_event_batch_id / batch_id

      Which merchant evidence was accepted for processing.

      Background processing

      upload_job_id / job_id

      Which durable processing job owns the work.

      Investigation run

      run or job resource

      Whether processing completed and what downstream evidence was produced.

      Reporting and monitoring

      durable record IDs

      How the completed evidence appears in Recovery Truth, issuer health, timelines, and reports.


      Administrative boundary

      Investigation-run routes are under /v1/admin/investigation-runs. Do not assume that a merchant X-API-Key grants access. These routes require an approved administrative context, such as an authenticated operator session or enterprise administrative authorization.


      The merchant-facing API returns an upload_job_id during payment-event ingestion. Store that identifier even when your application does not have direct access to the administrative routes. It gives support and operations teams a stable correlation point.

    2. Confirmed route family


      Method

      Path

      Purpose

      GET

      /v1/admin/investigation-runs

      List investigation runs visible to the authenticated tenant context.

      GET

      /v1/admin/investigation-runs/{job_id}

      Retrieve detailed information for one run.

      GET

      /v1/admin/investigation-runs/{job_id}/ status

      Retrieve the current processing status for one run.

      GET

      /v1/admin/investigation-runs/ readiness

      Evaluate whether the investigation-run subsystem is ready and properly connected.


    3. Authentication model

      Administrative API access is governed separately from the merchant API. The exact credential or session mechanism depends on the deployment and enterprise contract. The important rule is that tenant ownership must come from authenticated context, not from a tenant_id supplied in the URL, query string, or request body.

      • Use the administrative base URL and authentication method supplied by the Zahlen administrator.

      • Never add tenant_id to a request merely to make an empty result return data.

      • Treat a 401 response as an authentication problem and a 403 response as an authorization or role problem.

      • Treat an empty list as a possible valid tenant-scoped result until runtime and population health are checked.


      Fail closed

      If the platform cannot resolve the authenticated tenant or administrative identity, access should be denied. A production system must not fall back to a default tenant.

    4. Listing investigation runs

      Use the list route to discover recent runs available to the current administrative tenant. A list response commonly acts as an operational index: it helps operators find a job by date, source, status, or upload identifier before opening the detail resource.

      curl -sS 'https://api.example.com/v1/admin/investigation-runs' \

      -H 'Accept: application/json' \

      -H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool


The header above is illustrative. Use the administrative authentication contract supplied by your deployment; do not substitute X-API-Key unless that route is explicitly configured to accept it.

      1. What to capture from a list response


        Field category

        Examples

        Client use

        Identity

        job_id, upload_job_id, run identifier

        Open status and detail resources; correlate with ingestion responses.

        Ownership

        tenant or merchant context

        Confirm the record belongs to the authenticated scope.

        Lifecycle

        status, created_at, started_at, completed_at

        Sort recent work and identify stale or unfinished runs.

        Volume

        total rows, valid rows, invalid rows, error count

        Determine whether the run processed the expected evidence.

        Source

        upload, API ingestion, source label

        Trace the run back to the originating integration path.


      2. List-processing pattern

        1. Request the list using the approved administrative context.

        2. Filter or sort locally only after confirming the server already enforced tenant scope.

        3. Select the run using a durable job identifier, not only a human-readable timestamp.

        4. Open the status resource for active work and the detail resource for completed or failed work.

    1. Reading run status

      The status route answers a narrow operational question: what state is this job in now? It is appropriate for polling while a run is still processing and for detecting terminal completion or failure.

      curl -sS \

      'https://api.example.com/v1/admin/investigation-runs/ZN-2026-06-16-0001/status' \

      -H 'Accept: application/json' \

      -H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool


      1. Status categories


        Category

        Meaning

        Recommended behavior

        Queued or pending

        Accepted but not actively processing.

        Poll at a controlled interval; watch queue age.

        Running or processing

        Work is active.

        Continue polling with increasing intervals; do not submit a duplicate job.

        Completed

        Primary run processing reached a terminal success state.

        Retrieve detail and confirm downstream population.

        Failed

        The run ended without successful completion.

        Read error detail, preserve identifiers, and alert operations.

        Unknown or unavailable

        The resource is not visible, missing, or status cannot be resolved.

        Check tenant context, identifier, authorization, and runtime health.


        Completed is not the final diagnostic step

        A run can report completion while a downstream bridge or composition layer remains empty. After completion, verify Recovery Truth, radar, issuer health, monitoring events, timelines, cohort memory, and classification persistence when those features are in scope.


      2. Polling guidance

        • Start with a moderate interval rather than polling continuously.

        • Increase the interval for long-running jobs.

        • Stop polling when a terminal state is reached.

        • Apply a maximum polling duration and alert when it is exceeded.

        • Log job_id, request correlation, timestamps, and the final status.

    1. Retrieving run details

      The detail route provides the richer record needed for troubleshooting, reporting correlation, and governance review. Use it after a run completes, fails, or appears inconsistent with the downstream dashboards.

      curl -sS \

      'https://api.example.com/v1/admin/investigation-runs/ZN-2026-06-16-0001' \

      -H 'Accept: application/json' \

      -H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool


      1. Detail categories to inspect


        Category

        Questions to answer

        Identity

        Does the job ID match the upload_job_id captured during ingestion?

        Tenant and merchant scope

        Is the record visible only within the authenticated ownership boundary?

        Source evidence

        Was the run created from the expected API batch or uploaded evidence?

        Row accounting

        Do total, valid, invalid, and error counts reconcile?

        Lifecycle timing

        When was the run created, started, and completed? Is the duration plausible?

        Errors and warnings

        Are failures actionable, repeatable, and tied to specific evidence?

        Downstream outputs

        Which durable stores and monitoring layers were populated?


      2. Reconciliation rule

        Do not evaluate one count in isolation. Reconcile the evidence chain: submitted rows accepted rows valid rows invalid rows persisted records downstream monitoring artifacts. A discrepancy may be valid, but it should be explainable.


        Preserve identifiers

        Store upload_job_id from the merchant ingestion response and the administrative job identifier returned by the run APIs. These IDs are the most reliable way to connect a customer support case to durable processing evidence.

    1. End-to-end investigation-run workflow


      Step

      Developer or operator action

      Expected evidence

      1

      Submit payment events through

      /v1/payment-events or /v1/payment-events/batch.

      batch ID and upload_job_id

      2

      Store the returned identifiers with merchant-side event records.

      durable client correlation

      3

      List administrative investigation runs when authorized.

      tenant-scoped run index

      4

      Poll the selected run status at a controlled interval.

      current lifecycle state

      5

      Retrieve run detail after terminal completion or failure.

      row accounting, timestamps, errors, outputs

      6

      Verify downstream population and reporting.

      Recovery Truth, monitoring, classification, reports

      7

      Escalate anomalies using IDs, timestamps, and evidence.

      auditable incident or support record


    2. Relationship to the fixed retry schedule

      Investigation runs analyze evidence generated by the merchant payment process. They do not change Zahlen’s canonical retry schedule. Payment attempts remain governed by the fixed sequence: Day 1, Day 2, Day 6, and Day 16. Administrative polling or rerunning a report must never create an additional payment attempt.


      Operation

      May repeat automatically?

      Payment effect

      GET run list

      Yes, with reasonable polling limits.

      None

      GET run status

      Yes, with controlled intervals.

      None

      GET run detail

      Yes.

      None

      Resubmit payment evidence

      Only with documented replay safeguards.

      May create duplicate processing evidence if IDs are not stable.

      Execute payment retry

      Only on the fixed Day 1, Day 2, Day 6, Day 16 schedule.

      Creates a real authorization attempt.

    3. Error handling


      HTTP status

      Likely meaning

      Recommended response

      200

      Request succeeded.

      Parse the resource and persist identifiers or status.

      401

      Administrative authentication is missing or invalid.

      Verify the approved credential and environment.

      403

      The identity is authenticated but not allowed to access the route.

      Check role, enterprise entitlement, or endpoint policy.

      404

      The run is absent or not visible in the authenticated tenant scope.

      Verify job ID and tenant context; do not bypass ownership filters.

      422

      A supplied parameter failed validation.

      Correct the request before retrying.

      429

      Administrative rate limit or quota enforcement.

      Back off and honor Retry-After when present.

      500/503

      Runtime or dependency failure.

      Retry GET operations with bounded backoff; alert if sustained.


      1. Empty list troubleshooting

        1. Confirm the administrative identity and current environment.

        2. Confirm the expected ingestion request returned an upload_job_id.

        3. Check the investigation-run readiness route.

        4. Check worker, supervisor, queue, and last-cycle health.

        5. Verify tenant resolution before considering any data repair.

        6. Use backfill only as a controlled remediation after the missing bridge is identified.


        Do not bypass tenant isolation

        An empty list can be the correct response for the authenticated tenant. Never remove tenant filters or substitute a default production tenant merely to make data appear.

    4. Example client polling pattern

      The following Python example illustrates a bounded status poll. Replace the illustrative bearer token with the administrative authentication contract for your deployment.

      import os import time import requests


      BASE_URL = os.environ["ZAHLEN_BASE_URL"] ADMIN_TOKEN = os.environ["ZAHLEN_ADMIN_TOKEN"] JOB_ID = "ZN-2026-06-16-0001"


      headers = {

      "Accept": "application/json", "Authorization": f"Bearer {ADMIN_TOKEN}",

      }


      interval_seconds = 5

      max_interval_seconds = 60

      deadline = time.monotonic() + 15 * 60


      while time.monotonic() < deadline:

      response = requests.get(

      f"{BASE_URL}/v1/admin/investigation-runs/{JOB_ID}/status", headers=headers,

      timeout=20,

      )

      response.raise_for_status() payload = response.json()

      status = str(payload.get("status", "")).upper()


      if status in {"COMPLETED", "FAILED"}:

      print(payload) break


      time.sleep(interval_seconds)

      interval_seconds = min(max_interval_seconds, interval_seconds * 2) else:

      raise TimeoutError(f"Investigation run {JOB_ID} did not finish in time")


      1. Production improvements

        • Add jitter so multiple clients do not poll in lockstep.

        • Handle 429 using Retry-After when present.

        • Capture request IDs and response timestamps in logs.

        • Use a circuit breaker for sustained 5xx failures.

        • Retrieve the detail resource after COMPLETED or FAILED.

    1. Production readiness checklist


Chapter summary

Investigation runs make background processing observable. Use the list route to find tenant-scoped work, the status route to monitor lifecycle state, and the detail route to reconcile evidence, errors, and downstream outputs. Preserve upload_job_id, respect the administrative authorization boundary, and treat completion as the start of downstream verification—not as proof that every reporting layer is populated.


Key terms


Term

Meaning

upload_job_id

Identifier returned during ingestion that correlates merchant evidence with background processing.

Investigation run

Durable administrative record of processing and downstream population.

Terminal state

A lifecycle state such as COMPLETED or FAILED that ends active polling.

Readiness

Operational evaluation of whether investigation-run services and dependencies are available.

Population bridge

The service path that converts completed evidence into Recovery Truth and monitoring artifacts.