ZAHLEN

API USER GUIDE

Chapter 9

Investigation Runs API

Listing runs • Run status • Run detail retrieval

For merchants, developers, and integration engineers

Source baseline: zahlen_deploy_0616A.tar.gz | Version 1.0 | June 2026 Administrative visibility • Tenant-safe processing • Durable operational traceability

Commercial workflow context

Payment Event → Retry Decision → Retry Outcome → Investigation Run →

Reporting

‌Chapter 9 — Investigation Runs API

Learning objectives

By the end of this chapter, you should be able to list tenant-scoped investigation runs, read processing status, retrieve run details, and connect each run to the wider Zahlen evidence pipeline.

Investigation runs are durable administrative records that track how uploaded or API-ingested payment evidence moves through validation, normalization, analysis, and downstream population. They provide the operational bridge between a merchant submission and the reporting, Recovery Intelligence, issuer monitoring, and governance views that follow.

‌Where investigation runs fit

Stage	Primary identifier	What it tells you
Payment-event ingestion	payment_event_batch_id / batch_id	Which merchant evidence was accepted for processing.
Background processing	upload_job_id / job_id	Which durable processing job owns the work.
Investigation run	run or job resource	Whether processing completed and what downstream evidence was produced.
Reporting and monitoring	durable record IDs	How the completed evidence appears in Recovery Truth, issuer health, timelines, and reports.

Administrative boundary

Investigation-run routes are under /v1/admin/investigation-runs. Do not assume that a merchant X-API-Key grants access. These routes require an approved administrative context, such as an authenticated operator session or enterprise administrative authorization.

The merchant-facing API returns an upload_job_id during payment-event ingestion. Store that identifier even when your application does not have direct access to the administrative routes. It gives support and operations teams a stable correlation point.

‌Confirmed route family

Method	Path	Purpose
GET	/v1/admin/investigation-runs	List investigation runs visible to the authenticated tenant context.
GET	/v1/admin/investigation-runs/{job_id}	Retrieve detailed information for one run.
GET	/v1/admin/investigation-runs/{job_id}/ status	Retrieve the current processing status for one run.
GET	/v1/admin/investigation-runs/ readiness	Evaluate whether the investigation-run subsystem is ready and properly connected.

‌Authentication model
Administrative API access is governed separately from the merchant API. The exact credential or session mechanism depends on the deployment and enterprise contract. The important rule is that tenant ownership must come from authenticated context, not from a tenant_id supplied in the URL, query string, or request body.
- Use the administrative base URL and authentication method supplied by the Zahlen administrator.
- Never add tenant_id to a request merely to make an empty result return data.
- Treat a 401 response as an authentication problem and a 403 response as an authorization or role problem.
- Treat an empty list as a possible valid tenant-scoped result until runtime and population health are checked.
Fail closed
If the platform cannot resolve the authenticated tenant or administrative identity, access should be denied. A production system must not fall back to a default tenant.
‌Listing investigation runs
Use the list route to discover recent runs available to the current administrative tenant. A list response commonly acts as an operational index: it helps operators find a job by date, source, status, or upload identifier before opening the detail resource.
curl -sS 'https://api.example.com/v1/admin/investigation-runs' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool

The header above is illustrative. Use the administrative authentication contract supplied by your deployment; do not substitute X-API-Key unless that route is explicitly configured to accept it.

‌What to capture from a list response

Field category	Examples	Client use
Identity	job_id, upload_job_id, run identifier	Open status and detail resources; correlate with ingestion responses.
Ownership	tenant or merchant context	Confirm the record belongs to the authenticated scope.
Lifecycle	status, created_at, started_at, completed_at	Sort recent work and identify stale or unfinished runs.
Volume	total rows, valid rows, invalid rows, error count	Determine whether the run processed the expected evidence.
Source	upload, API ingestion, source label	Trace the run back to the originating integration path.

‌List-processing pattern
1. Request the list using the approved administrative context.
2. Filter or sort locally only after confirming the server already enforced tenant scope.
3. Select the run using a durable job identifier, not only a human-readable timestamp.
4. Open the status resource for active work and the detail resource for completed or failed work.

‌Reading run status
The status route answers a narrow operational question: what state is this job in now? It is appropriate for polling while a run is still processing and for detecting terminal completion or failure.
curl -sS \
'https://api.example.com/v1/admin/investigation-runs/ZN-2026-06-16-0001/status' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool

‌Status categories

Category	Meaning	Recommended behavior
Queued or pending	Accepted but not actively processing.	Poll at a controlled interval; watch queue age.
Running or processing	Work is active.	Continue polling with increasing intervals; do not submit a duplicate job.
Completed	Primary run processing reached a terminal success state.	Retrieve detail and confirm downstream population.
Failed	The run ended without successful completion.	Read error detail, preserve identifiers, and alert operations.
Unknown or unavailable	The resource is not visible, missing, or status cannot be resolved.	Check tenant context, identifier, authorization, and runtime health.

Completed is not the final diagnostic step

A run can report completion while a downstream bridge or composition layer remains empty. After completion, verify Recovery Truth, radar, issuer health, monitoring events, timelines, cohort memory, and classification persistence when those features are in scope.

‌Polling guidance
- Start with a moderate interval rather than polling continuously.
- Increase the interval for long-running jobs.
- Stop polling when a terminal state is reached.
- Apply a maximum polling duration and alert when it is exceeded.
- Log job_id, request correlation, timestamps, and the final status.

‌Retrieving run details
The detail route provides the richer record needed for troubleshooting, reporting correlation, and governance review. Use it after a run completes, fails, or appears inconsistent with the downstream dashboards.
curl -sS \
'https://api.example.com/v1/admin/investigation-runs/ZN-2026-06-16-0001' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer ADMIN_TOKEN_REPLACE_ME' | python -m json.tool

‌Detail categories to inspect

Category	Questions to answer
Identity	Does the job ID match the upload_job_id captured during ingestion?
Tenant and merchant scope	Is the record visible only within the authenticated ownership boundary?
Source evidence	Was the run created from the expected API batch or uploaded evidence?
Row accounting	Do total, valid, invalid, and error counts reconcile?
Lifecycle timing	When was the run created, started, and completed? Is the duration plausible?
Errors and warnings	Are failures actionable, repeatable, and tied to specific evidence?
Downstream outputs	Which durable stores and monitoring layers were populated?

‌Reconciliation rule

Do not evaluate one count in isolation. Reconcile the evidence chain: submitted rows → accepted rows → valid rows → invalid rows → persisted records → downstream monitoring artifacts. A discrepancy may be valid, but it should be explainable.

Preserve identifiers

Store upload_job_id from the merchant ingestion response and the administrative job identifier returned by the run APIs. These IDs are the most reliable way to connect a customer support case to durable processing evidence.

‌End-to-end investigation-run workflow

Step	Developer or operator action	Expected evidence
1	Submit payment events through /v1/payment-events or /v1/payment-events/batch.	batch ID and upload_job_id
2	Store the returned identifiers with merchant-side event records.	durable client correlation
3	List administrative investigation runs when authorized.	tenant-scoped run index
4	Poll the selected run status at a controlled interval.	current lifecycle state
5	Retrieve run detail after terminal completion or failure.	row accounting, timestamps, errors, outputs
6	Verify downstream population and reporting.	Recovery Truth, monitoring, classification, reports
7	Escalate anomalies using IDs, timestamps, and evidence.	auditable incident or support record

‌Relationship to the fixed retry schedule

Investigation runs analyze evidence generated by the merchant payment process. They do not change Zahlen’s canonical retry schedule. Payment attempts remain governed by the fixed sequence: Day 1, Day 2, Day 6, and Day 16. Administrative polling or rerunning a report must never create an additional payment attempt.

Operation	May repeat automatically?	Payment effect
GET run list	Yes, with reasonable polling limits.	None
GET run status	Yes, with controlled intervals.	None
GET run detail	Yes.	None
Resubmit payment evidence	Only with documented replay safeguards.	May create duplicate processing evidence if IDs are not stable.
Execute payment retry	Only on the fixed Day 1, Day 2, Day 6, Day 16 schedule.	Creates a real authorization attempt.

‌Error handling

HTTP status	Likely meaning	Recommended response
200	Request succeeded.	Parse the resource and persist identifiers or status.
401	Administrative authentication is missing or invalid.	Verify the approved credential and environment.
403	The identity is authenticated but not allowed to access the route.	Check role, enterprise entitlement, or endpoint policy.
404	The run is absent or not visible in the authenticated tenant scope.	Verify job ID and tenant context; do not bypass ownership filters.
422	A supplied parameter failed validation.	Correct the request before retrying.
429	Administrative rate limit or quota enforcement.	Back off and honor Retry-After when present.
500/503	Runtime or dependency failure.	Retry GET operations with bounded backoff; alert if sustained.

‌Empty list troubleshooting
1. Confirm the administrative identity and current environment.
2. Confirm the expected ingestion request returned an upload_job_id.
3. Check the investigation-run readiness route.
4. Check worker, supervisor, queue, and last-cycle health.
5. Verify tenant resolution before considering any data repair.
6. Use backfill only as a controlled remediation after the missing bridge is identified.
Do not bypass tenant isolation
An empty list can be the correct response for the authenticated tenant. Never remove tenant filters or substitute a default production tenant merely to make data appear.

‌Example client polling pattern
The following Python example illustrates a bounded status poll. Replace the illustrative bearer token with the administrative authentication contract for your deployment.
import os import time import requests

BASE_URL = os.environ["ZAHLEN_BASE_URL"] ADMIN_TOKEN = os.environ["ZAHLEN_ADMIN_TOKEN"] JOB_ID = "ZN-2026-06-16-0001"

headers = {
"Accept": "application/json", "Authorization": f"Bearer {ADMIN_TOKEN}",
}

interval_seconds = 5
max_interval_seconds = 60
deadline = time.monotonic() + 15 * 60

while time.monotonic() < deadline:
response = requests.get(
f"{BASE_URL}/v1/admin/investigation-runs/{JOB_ID}/status", headers=headers,
timeout=20,
)
response.raise_for_status() payload = response.json()
status = str(payload.get("status", "")).upper()

if status in {"COMPLETED", "FAILED"}:
print(payload) break

time.sleep(interval_seconds)
interval_seconds = min(max_interval_seconds, interval_seconds * 2) else:
raise TimeoutError(f"Investigation run {JOB_ID} did not finish in time")

‌Production improvements
- Add jitter so multiple clients do not poll in lockstep.
- Handle 429 using Retry-After when present.
- Capture request IDs and response timestamps in logs.
- Use a circuit breaker for sustained 5xx failures.
- Retrieve the detail resource after COMPLETED or FAILED.

‌Production readiness checklist

Administrative access is explicitly approved and separated from merchant API-key access.
Environment base URLs and credentials are not shared across development, staging, and
production.
upload_job_id is stored for every accepted payment-event ingestion request.
Run listing is tenant-scoped and an empty list is handled as a valid possible result.
Status polling is bounded, uses increasing intervals, and stops on terminal states.
Completed runs are followed by downstream population verification when dashboards are
expected to contain data.
Failed runs preserve error evidence, identifiers, and timestamps for support or incident review.
No administrative poll, retry, or remediation creates a payment attempt outside Day 1, Day 2, Day
6, and Day 16.
Backfill is used only after identifying the missing population bridge.
Logs do not expose secrets or prohibited cardholder data.

Chapter summary

Investigation runs make background processing observable. Use the list route to find tenant-scoped work, the status route to monitor lifecycle state, and the detail route to reconcile evidence, errors, and downstream outputs. Preserve upload_job_id, respect the administrative authorization boundary, and treat completion as the start of downstream verification—not as proof that every reporting layer is populated.

‌Key terms

Term	Meaning
upload_job_id	Identifier returned during ingestion that correlates merchant evidence with background processing.
Investigation run	Durable administrative record of processing and downstream population.
Terminal state	A lifecycle state such as COMPLETED or FAILED that ends active polling.
Readiness	Operational evaluation of whether investigation-run services and dependencies are available.
Population bridge	The service path that converts completed evidence into Recovery Truth and monitoring artifacts.