Zahlen Documentation

2.2 - CSV Ingestion Guide

Purpose

This guide explains how operators and implementation teams should prepare CSV files for Zahlen ingestion. It is written for the first operational handoff between merchant data and the Zahlen issuer-intelligence pipeline. The emphasis is on canonical field naming, stable response-code interpretation, replay-safe ingestion, and practical troubleshooting.

Overview

CSV ingestion is the simplest path for bringing merchant payment-event data into Zahlen. The ingestion process accepts uploaded transaction records, normalizes the file into a canonical event shape, validates required fields, computes data-completeness indicators, and prepares the records for downstream issuer-health analysis.

The implementation in src-0527A defines a canonical CSV contract in canonical_csv_schema.py and validation behavior in csv_validation.py. Those modules establish response_code as the primary response-code field, preserve compatibility with older aliases, and normalize supported upload formats into the same internal validated row structure.

The purpose of the CSV ingestion layer is not simply to accept files. Its purpose is to convert merchant payment records into deterministic operational evidence that can support recovery observability, issuer health monitoring, replay consistency, telemetry review, and future ecosystem intelligence.

Supported Schema Model

Zahlen uses a canonical CSV schema as the preferred ingestion contract. A canonical schema is the stable field model that the platform expects to analyze consistently over time. When merchant data follows the canonical schema, the platform can interpret payment identity, issuer identity, response-code behavior, retry timing, recovery state, and operational context without relying on ambiguous processor-specific naming.

The minimum canonical upload requires order_id and response_code. The order_id field identifies the payment event or merchant transaction record. The response_code field identifies the payment response or decline condition that Zahlen uses as the canonical signal for issuer analysis.

Although only two fields are strictly required for the canonical upload path, production-quality issuer intelligence requires richer context. Fields such as bin, country, card_brand, attempt_number, retry_day, event_timestamp, recovered, final_success, and payment_status materially improve the quality of recovery analysis and issuer-health interpretation.

Field	Status	Operational meaning
order_id	Required	The stable merchant-side record identifier for the payment event. This field gives Zahlen a deterministic identity anchor for validation, troubleshooting, and replay alignment.
response_code	Required	The canonical payment response field. This field replaces processor-specific naming as the primary signal used for response-code grouping, decline behavior analysis, recovery-rate calculation, and issuer-health alerting.

Recommended Canonical Fields

Recommended fields are not always required for a valid upload, but they substantially improve the quality of operational interpretation. Each recommended field adds context that helps Zahlen distinguish issuer behavior from customer behavior, merchant behavior, regional behavior, or incomplete source data.

Field	Definition and operator value
customer_id	Identifies the customer associated with the transaction. This supports cohort analysis and helps distinguish customer-level recurrence from issuer-level behavior.
subscription_id	Identifies the subscription or recurring billing relationship. This helps Zahlens recovery analysis remain aligned with subscription lifecycle behavior.
merchant	Names the merchant or merchant environment that produced the event. This helps operators interpret the data source without exposing tenant-private data across network intelligence boundaries.
merchant_id	Provides a stable merchant identifier. This is useful for multi-merchant deployments and tenant-safe aggregation controls.
attempt_number	Identifies which retry attempt generated the event. This allows Zahlen to analyze recovery behavior by attempt sequence rather than treating all attempts as equivalent.
retry_day	Identifies the retry day or recovery window associated with the event. This field supports deterministic retry-window analysis and recovery-curve interpretation.
event_timestamp	Records when the payment event occurred. This field supports timeline analysis, replay ordering, operational freshness checks, and historical comparison.
amount	Records the transaction amount in major currency units. This helps operators understand financial exposure and supports future value-weighted analysis.
currency	Records the three-letter currency code. This helps distinguish regional and currency-specific behavior and supports validation of monetary context.
bank	Identifies the issuing bank or issuer name when available. This improves operator readability and supports issuer-focused reporting.
bin	Identifies the issuer BIN or BIN prefix. This is one of the most important issuer-cohort anchors because many Zahlen investigations group behavior by issuer BIN.
country	Identifies the issuer or card country as a two-letter country code. This helps distinguish local degradation from cross-country instability.
card_brand	Identifies the card brand such as Visa or Mastercard. This helps operators separate issuer behavior from brand-specific or network-specific behavior.
authorization_id	Provides the authorization reference when available. This supports traceability between merchant systems and payment authorization records.
authorization_latency_ms	Records authorization latency in milliseconds. This can help operators identify operational slowness or infrastructure stress beyond approval or decline outcomes.
merchant_category_code	Identifies the merchant category code. This can help explain differences in authorization posture across merchant types or risk categories.
recurring_indicator	Indicates whether the event is part of recurring billing behavior. This is important because recurring authorization posture can differ from one-time payment behavior.
transaction_initiator	Identifies whether the transaction was merchant-initiated, customer-initiated, or otherwise initiated. This helps interpret issuer authorization behavior in subscription contexts.
decision_action	Records the action recommended or taken by the payment decision system. This field is useful when CSV records include Zahlen decision output or downstream operational state.
decision_state	Records the decision state associated with the event. This can contribute to recovery success interpretation when explicit recovered or final_success fields are unavailable.
payment_status	Records the payment outcome state. Zahlen can use approved-style payment_status values as one of the success sources for recovery-rate calculation.
recovered	Records whether the payment eventually recovered. This directly supports recovery-rate calculation and retry effectiveness analysis.
final_success	Records whether the lifecycle ultimately succeeded. This supports final payment success interpretation and recovery observability.
lifecycle_state	Records the subscription or payment lifecycle state. This helps operators interpret whether a record belongs to active recovery, suspension, closure, or another lifecycle phase.
test_scenario	Identifies synthetic, QA, or test-scenario records. This helps operators distinguish live operational data from controlled validation data.

Supported Schema Examples

The preferred ingestion format is the Zahlen canonical CSV. The canonical format uses response_code as the primary response-code field and represents issuer identity through bin, country, bank, and card_brand. The example below is intentionally compact, but it includes enough fields to support meaningful issuer-health analysis.

order_id,response_code,bin,country,bank,card_brand,attempt_number,retry_day,event_timestamp,amount,currency,recovered,final_success,payment_status
ORD-1001,51,414720,US,Example Bank,visa,1,1,2026-05-27T14:10:11+00:00,29.99,USD,false,false,declined
ORD-1002,00,414720,US,Example Bank,visa,2,2,2026-05-28T14:10:11+00:00,29.99,USD,true,true,approved

The platform also preserves compatibility with legacy and processor-export-style files. Compatibility means that Zahlen can recognize common alternate column names and normalize them into the canonical field model. Compatibility does not change the documentation standard: response_code remains the canonical field name, and processor-specific names should not become the primary language of operator documentation.

Input style	How Zahlen interprets it
Zahlen canonical CSV	This is the preferred schema. It requires order_id and response_code and may include the full issuer, retry, recovery, and lifecycle context described above.
Legacy compatibility CSV	Legacy rows may use token, issuer_bin, decline_code, attempt_number, and event_timestamp. Zahlen maps token to order_id, issuer_bin to bin, and decline_code to response_code.
Decision-output CSV	Rows that include decision_action or decision_state are interpreted as enriched Zahlen decision output. This format can support operational review because it includes decision context in addition to raw payment evidence.
External processor export compatibility	The validation layer can recognize several common export shapes and map their fields into the canonical model. This support is compatibility-oriented; the canonical documentation and operator surfaces should continue to use response_code, bin, country, and card_brand as primary terms.

Canonical Field Mapping

Canonical field mapping is the process of translating different upload column names into the stable internal field names that Zahlen uses for analysis. This matters because payment data frequently arrives from different processors, internal tools, exports, or older integration paths.

The csv_validation.py implementation normalizes column names by trimming whitespace, lowercasing values, and treating hyphenated names as space-separated names. It then selects from known candidate names and maps them into the canonical row structure.

Canonical field	Recognized source labels	Why the mapping matters
order_id	order id, merchant reference, merchant_reference, merchant reference id, charge id, id, transaction id, pspreference	This mapping preserves a stable merchant-side event identity even when exports use different transaction reference labels.
response_code	response code, paymentech_code, paymentech code, decline_code, decline code, refusal reason code, failure code, reason code, status	This mapping preserves response_code as the canonical analytical signal while allowing older or external export labels to remain ingestible.
bin	bin, issuer bin, issuer_bin, bin prefix	This mapping identifies the issuer cohort used for issuer-health grouping and investigation routing.
country	country, issuer country, issuer_country, card country, country/region	This mapping supports country-level issuer behavior analysis and localized degradation detection.
bank	bank, issuer name, issuer_name, acquirer	This mapping improves operator readability and helps connect technical issuer cohorts to recognizable issuer names when available.
amount	amount, amount value, gross, value	This mapping preserves transaction value context for financial exposure analysis.
card_brand	card brand, brand, card type	This mapping supports card-brand segmentation and helps distinguish issuer behavior from network or brand-level behavior.
event_timestamp	event timestamp, created, creation date, booking date, processed at	This mapping supports timeline reconstruction, freshness analysis, and replay ordering.
authorization_id	authorization id, authorization code	This mapping preserves authorization traceability when the source system provides a reference.
merchant_category_code	merchant category code, mcc	This mapping supports risk-context interpretation by merchant category.
recurring_indicator	shopper interaction, recurring processing model	This mapping supports subscription-specific interpretation because recurring payments can behave differently from customer-initiated payments.
transaction_initiator	initiated by, initiator	This mapping helps determine whether issuer behavior relates to merchant-initiated or customer-initiated payment posture.

Response Code Conventions

The response_code field is the canonical payment response field in Zahlen. This is an important documentation and product convention. Older field names such as paymentech_code, decline_code, processor_code, or payment_response_code may still be recognized as compatibility aliases, but operator-facing documentation should treat response_code as the primary concept.

A response code is the normalized signal that describes the payment outcome or decline condition. Zahlen uses this signal to group issuer behavior, compute recovery rates, detect response-code-specific degradation, generate alerts, and support investigation drill-downs.

The source code preserves paymentech_code as a legacy compatibility alias in several places so that older artifacts and URLs remain usable. However, src-0527A explicitly treats response_code as canonical in the job route logic. This means new documentation, new UI labels, and new integration guidance should avoid presenting processor-specific code names as the primary vocabulary.

Convention	Definition
Use response_code as the canonical name.	All new CSV guidance should instruct operators and integration teams to provide response_code. This creates stable terminology across ingestion, results, records, alerts, and investigations.
Treat paymentech_code as a compatibility alias.	The platform may continue to recognize paymentech_code in older files or artifacts, but this field should not be used as the primary documentation term.
Preserve the original response value as text.	Response codes may include leading zeroes or non-numeric status values. Treating the field as text prevents accidental normalization that could change the operational meaning.
Interpret response codes in issuer context.	A response code has operational meaning only when evaluated alongside issuer BIN, country, card brand, retry window, and recovery outcome.

Ingestion Troubleshooting

CSV ingestion troubleshooting should begin with the validation layer. The validation layer is designed to identify missing required headers, unsupported headers, invalid values, and unrecognized row formats before downstream issuer analysis begins.

The most important troubleshooting principle is to correct the CSV contract first. If the upload cannot be normalized into the canonical row structure, downstream analysis may be incomplete, misleading, or unavailable.

Validation issue	Definition and recommended correction
missing_required_header	This means the file does not contain the required canonical fields or a complete recognized legacy header set. Add order_id and response_code for the canonical path, or confirm that legacy uploads contain token, issuer_bin, decline_code, attempt_number, and event_timestamp.
unexpected_header	This means the file contains a header outside the allowed canonical, optional, or compatibility fields. Remove the unexpected field or map it to a supported canonical field.
missing_required_value	This means a required field is present but empty in at least one row. Populate the missing value or remove the invalid row before upload.
invalid_bin or invalid_issuer_bin	This means the BIN field contains non-digit characters. BIN values should contain only digits because they are used as issuer-cohort identifiers.
invalid_country	This means the country field is not a two-letter country code. Use ISO-style two-letter values such as US, CH, ES, or CA.
invalid_integer	This means a field such as attempt_number or retry_day contains a non-integer value. Replace text, decimals, or blank placeholders with valid integers where the field is provided.
integer_below_minimum	This means a numeric field is below the allowed minimum. attempt_number must be at least 1, and retry_day must be zero or greater when provided.
invalid_number	This means amount or authorization_latency_ms contains a value that cannot be parsed as numeric. Use standard numeric formatting without currency symbols.
number_below_minimum	This means amount or latency is below the allowed minimum. Amount and authorization latency should not be negative.
invalid_timestamp	This means event_timestamp is not a valid ISO-like timestamp. Use a timestamp such as 2026-05-27T14:10:11+00:00.
invalid_currency	This means currency is not a three-letter code. Use values such as USD, CHF, CAD, or EUR.
invalid_boolean	This means recovered or final_success contains a value outside accepted boolean forms. Accepted true values include true, t, 1, yes, and y. Accepted false values include false, f, 0, no, and n.
unrecognized_row_format	This means the row does not match canonical, legacy, or recognized compatibility formats. Start with the canonical minimal schema of order_id and response_code, then add optional fields gradually.

Replay-Safe Ingestion Explanation

Replay-safe ingestion means that uploaded payment records are converted into a stable, reproducible event structure that can support future analysis, investigation, and governance review. The goal is not merely to process the file once. The goal is to preserve enough structure for the same evidence to be interpreted consistently later.

The source implementation supports replay safety through canonical field normalization, required identity fields, row numbering, timestamp validation, explicit data-completeness scoring, and stable response-code conventions. Each of these elements reduces ambiguity and improves the platform’s ability to reconstruct analysis from historical data.

Replay-safe element	Why it matters
Canonical field names	Canonical names reduce ambiguity. When response_code, bin, country, and card_brand mean the same thing across uploads, the platform can compare results across time and replay windows.
Stable order identity	order_id gives each record a merchant-side identity anchor. This supports troubleshooting, row-level review, and deterministic reconstruction.
Issuer identity context	bin, country, bank, and card_brand help Zahlen determine whether behavior belongs to an issuer cohort, a region, a brand, or an incomplete record.
Event timestamp	event_timestamp supports timeline ordering and historical comparison. Without timing context, the platform has less ability to distinguish old instability from current instability.
Recovery outcome fields	recovered, final_success, payment_status, and decision_state allow the recovery-rate calculation to determine whether the payment eventually succeeded.
Data completeness score	The validation layer computes a data-completeness score from key optional fields. This helps operators understand whether weak analysis is caused by poor source data rather than weak issuer signals.
Request source	The request_source value helps identify whether a row came from canonical upload, legacy compatibility, or another recognized source format. This improves debugging and operational interpretation.

Operator Checklist

Before uploading a CSV, the operator or implementation team should confirm that the file uses response_code as the primary response field, includes order_id for stable row identity, preserves issuer identity through bin and country when available, and includes at least one recovery outcome source when recovery-rate analysis is expected.

For production-quality results, the file should include attempt_number, retry_day, event_timestamp, card_brand, recovered, final_success, and payment_status whenever those values are available. These fields convert a basic decline report into useful recovery intelligence.

Recommended operating rule

Use the smallest valid canonical file only for early testing. For operational use, provide the richest possible canonical context because issuer intelligence depends on identity, timing, recovery outcome, and issuer-cohort context.

Summary

CSV ingestion is the entry point that turns merchant payment records into Zahlen operational evidence. A well-formed file gives the platform enough context to evaluate issuer behavior, compute recovery patterns, generate alerts, support investigation workflows, and preserve replay-safe operational memory.

The most important documentation principle is that response_code is canonical. Compatibility aliases exist to protect older inputs and external exports, but the product language, operator workflow, and future integration guidance should use the canonical response_code convention.