Zahlen Documentation
2.2 - CSV Ingestion Guide
|
Purpose This guide explains how operators and implementation teams should prepare CSV files for Zahlen ingestion. It is written for the first operational handoff between merchant data and the Zahlen issuer-intelligence pipeline. The emphasis is on canonical field naming, stable response-code interpretation, replay-safe ingestion, and practical troubleshooting. |
CSV ingestion is the simplest path for bringing merchant payment-event data into Zahlen. The ingestion process accepts uploaded transaction records, normalizes the file into a canonical event shape, validates required fields, computes data-completeness indicators, and prepares the records for downstream issuer-health analysis.
The implementation in src-0527A defines a canonical CSV contract in canonical_csv_schema.py and validation behavior in csv_validation.py. Those modules establish response_code as the primary response-code field, preserve compatibility with older aliases, and normalize supported upload formats into the same internal validated row structure.
The purpose of the CSV ingestion layer is not simply to accept files. Its purpose is to convert merchant payment records into deterministic operational evidence that can support recovery observability, issuer health monitoring, replay consistency, telemetry review, and future ecosystem intelligence.
Zahlen uses a canonical CSV schema as the preferred ingestion contract. A canonical schema is the stable field model that the platform expects to analyze consistently over time. When merchant data follows the canonical schema, the platform can interpret payment identity, issuer identity, response-code behavior, retry timing, recovery state, and operational context without relying on ambiguous processor-specific naming.
The minimum canonical upload requires order_id and response_code. The order_id field identifies the payment event or merchant transaction record. The response_code field identifies the payment response or decline condition that Zahlen uses as the canonical signal for issuer analysis.
Although only two fields are strictly required for the canonical upload path, production-quality issuer intelligence requires richer context. Fields such as bin, country, card_brand, attempt_number, retry_day, event_timestamp, recovered, final_success, and payment_status materially improve the quality of recovery analysis and issuer-health interpretation.
|
Field |
Status |
Operational meaning |
|
order_id |
Required |
The stable merchant-side record identifier for the payment event. This field gives Zahlen a deterministic identity anchor for validation, troubleshooting, and replay alignment. |
|
response_code |
Required |
The canonical payment response field. This field replaces processor-specific naming as the primary signal used for response-code grouping, decline behavior analysis, recovery-rate calculation, and issuer-health alerting. |
Recommended fields are not always required for a valid upload, but they substantially improve the quality of operational interpretation. Each recommended field adds context that helps Zahlen distinguish issuer behavior from customer behavior, merchant behavior, regional behavior, or incomplete source data.
|
Field |
Definition and operator value |
|
customer_id |
Identifies the customer associated with the transaction. This supports cohort analysis and helps distinguish customer-level recurrence from issuer-level behavior. |
|
subscription_id |
Identifies the subscription or recurring billing relationship. This helps Zahlens recovery analysis remain aligned with subscription lifecycle behavior. |
|
merchant |
Names the merchant or merchant environment that produced the event. This helps operators interpret the data source without exposing tenant-private data across network intelligence boundaries. |
|
merchant_id |
Provides a stable merchant identifier. This is useful for multi-merchant deployments and tenant-safe aggregation controls. |
|
attempt_number |
Identifies which retry attempt generated the event. This allows Zahlen to analyze recovery behavior by attempt sequence rather than treating all attempts as equivalent. |
|
retry_day |
Identifies the retry day or recovery window associated with the event. This field supports deterministic retry-window analysis and recovery-curve interpretation. |
|
event_timestamp |
Records when the payment event occurred. This field supports timeline analysis, replay ordering, operational freshness checks, and historical comparison. |
|
amount |
Records the transaction amount in major currency units. This helps operators understand financial exposure and supports future value-weighted analysis. |
|
currency |
Records the three-letter currency code. This helps distinguish regional and currency-specific behavior and supports validation of monetary context. |
|
bank |
Identifies the issuing bank or issuer name when available. This improves operator readability and supports issuer-focused reporting. |
|
bin |
Identifies the issuer BIN or BIN prefix. This is one of the most important issuer-cohort anchors because many Zahlen investigations group behavior by issuer BIN. |
|
country |
Identifies the issuer or card country as a two-letter country code. This helps distinguish local degradation from cross-country instability. |
|
card_brand |
Identifies the card brand such as Visa or Mastercard. This helps operators separate issuer behavior from brand-specific or network-specific behavior. |
|
authorization_id |
Provides the authorization reference when available. This supports traceability between merchant systems and payment authorization records. |
|
authorization_latency_ms |
Records authorization latency in milliseconds. This can help operators identify operational slowness or infrastructure stress beyond approval or decline outcomes. |
|
merchant_category_code |
Identifies the merchant category code. This can help explain differences in authorization posture across merchant types or risk categories. |
|
recurring_indicator |
Indicates whether the event is part of recurring billing behavior. This is important because recurring authorization posture can differ from one-time payment behavior. |
|
transaction_initiator |
Identifies whether the transaction was merchant-initiated, customer-initiated, or otherwise initiated. This helps interpret issuer authorization behavior in subscription contexts. |
|
decision_action |
Records the action recommended or taken by the payment decision system. This field is useful when CSV records include Zahlen decision output or downstream operational state. |
|
decision_state |
Records the decision state associated with the event. This can contribute to recovery success interpretation when explicit recovered or final_success fields are unavailable. |
|
payment_status |
Records the payment outcome state. Zahlen can use approved-style payment_status values as one of the success sources for recovery-rate calculation. |
|
recovered |
Records whether the payment eventually recovered. This directly supports recovery-rate calculation and retry effectiveness analysis. |
|
final_success |
Records whether the lifecycle ultimately succeeded. This supports final payment success interpretation and recovery observability. |
|
lifecycle_state |
Records the subscription or payment lifecycle state. This helps operators interpret whether a record belongs to active recovery, suspension, closure, or another lifecycle phase. |
|
test_scenario |
Identifies synthetic, QA, or test-scenario records. This helps operators distinguish live operational data from controlled validation data. |
The preferred ingestion format is the Zahlen canonical CSV. The canonical format uses response_code as the primary response-code field and represents issuer identity through bin, country, bank, and card_brand. The example below is intentionally compact, but it includes enough fields to support meaningful issuer-health analysis.
|
order_id,response_code,bin,country,bank,card_brand,attempt_number,retry_day,event_timestamp,amount,currency,recovered,final_success,payment_status |
The platform also preserves compatibility with legacy and processor-export-style files. Compatibility means that Zahlen can recognize common alternate column names and normalize them into the canonical field model. Compatibility does not change the documentation standard: response_code remains the canonical field name, and processor-specific names should not become the primary language of operator documentation.
|
Input style |
How Zahlen interprets it |
|
Zahlen canonical CSV |
This is the preferred schema. It requires order_id and response_code and may include the full issuer, retry, recovery, and lifecycle context described above. |
|
Legacy compatibility CSV |
Legacy rows may use token, issuer_bin, decline_code, attempt_number, and event_timestamp. Zahlen maps token to order_id, issuer_bin to bin, and decline_code to response_code. |
|
Decision-output CSV |
Rows that include decision_action or decision_state are interpreted as enriched Zahlen decision output. This format can support operational review because it includes decision context in addition to raw payment evidence. |
|
External processor export compatibility |
The validation layer can recognize several common export shapes and map their fields into the canonical model. This support is compatibility-oriented; the canonical documentation and operator surfaces should continue to use response_code, bin, country, and card_brand as primary terms. |
Canonical field mapping is the process of translating different upload column names into the stable internal field names that Zahlen uses for analysis. This matters because payment data frequently arrives from different processors, internal tools, exports, or older integration paths.
The csv_validation.py implementation normalizes column names by trimming whitespace, lowercasing values, and treating hyphenated names as space-separated names. It then selects from known candidate names and maps them into the canonical row structure.
|
Canonical field |
Recognized source labels |
Why the mapping matters |
|
order_id |
order id, merchant reference, merchant_reference, merchant reference id, charge id, id, transaction id, pspreference |
This mapping preserves a stable merchant-side event identity even when exports use different transaction reference labels. |
|
response_code |
response code, paymentech_code, paymentech code, decline_code, decline code, refusal reason code, failure code, reason code, status |
This mapping preserves response_code as the canonical analytical signal while allowing older or external export labels to remain ingestible. |
|
bin |
bin, issuer bin, issuer_bin, bin prefix |
This mapping identifies the issuer cohort used for issuer-health grouping and investigation routing. |
|
country |
country, issuer country, issuer_country, card country, country/region |
This mapping supports country-level issuer behavior analysis and localized degradation detection. |
|
bank |
bank, issuer name, issuer_name, acquirer |
This mapping improves operator readability and helps connect technical issuer cohorts to recognizable issuer names when available. |
|
amount |
amount, amount value, gross, value |
This mapping preserves transaction value context for financial exposure analysis. |
|
card_brand |
card brand, brand, card type |
This mapping supports card-brand segmentation and helps distinguish issuer behavior from network or brand-level behavior. |
|
event_timestamp |
event timestamp, created, creation date, booking date, processed at |
This mapping supports timeline reconstruction, freshness analysis, and replay ordering. |
|
authorization_id |
authorization id, authorization code |
This mapping preserves authorization traceability when the source system provides a reference. |
|
merchant_category_code |
merchant category code, mcc |
This mapping supports risk-context interpretation by merchant category. |
|
recurring_indicator |
shopper interaction, recurring processing model |
This mapping supports subscription-specific interpretation because recurring payments can behave differently from customer-initiated payments. |
|
transaction_initiator |
initiated by, initiator |
This mapping helps determine whether issuer behavior relates to merchant-initiated or customer-initiated payment posture. |
The response_code field is the canonical payment response field in Zahlen. This is an important documentation and product convention. Older field names such as paymentech_code, decline_code, processor_code, or payment_response_code may still be recognized as compatibility aliases, but operator-facing documentation should treat response_code as the primary concept.
A response code is the normalized signal that describes the payment outcome or decline condition. Zahlen uses this signal to group issuer behavior, compute recovery rates, detect response-code-specific degradation, generate alerts, and support investigation drill-downs.
The source code preserves paymentech_code as a legacy compatibility alias in several places so that older artifacts and URLs remain usable. However, src-0527A explicitly treats response_code as canonical in the job route logic. This means new documentation, new UI labels, and new integration guidance should avoid presenting processor-specific code names as the primary vocabulary.
|
Convention |
Definition |
|
Use response_code as the canonical name. |
All new CSV guidance should instruct operators and integration teams to provide response_code. This creates stable terminology across ingestion, results, records, alerts, and investigations. |
|
Treat paymentech_code as a compatibility alias. |
The platform may continue to recognize paymentech_code in older files or artifacts, but this field should not be used as the primary documentation term. |
|
Preserve the original response value as text. |
Response codes may include leading zeroes or non-numeric status values. Treating the field as text prevents accidental normalization that could change the operational meaning. |
|
Interpret response codes in issuer context. |
A response code has operational meaning only when evaluated alongside issuer BIN, country, card brand, retry window, and recovery outcome. |
CSV ingestion troubleshooting should begin with the validation layer. The validation layer is designed to identify missing required headers, unsupported headers, invalid values, and unrecognized row formats before downstream issuer analysis begins.
The most important troubleshooting principle is to correct the CSV contract first. If the upload cannot be normalized into the canonical row structure, downstream analysis may be incomplete, misleading, or unavailable.
|
Validation issue |
Definition and recommended correction |
|
missing_required_header |
This means the file does not contain the required canonical fields or a complete recognized legacy header set. Add order_id and response_code for the canonical path, or confirm that legacy uploads contain token, issuer_bin, decline_code, attempt_number, and event_timestamp. |
|
unexpected_header |
This means the file contains a header outside the allowed canonical, optional, or compatibility fields. Remove the unexpected field or map it to a supported canonical field. |
|
missing_required_value |
This means a required field is present but empty in at least one row. Populate the missing value or remove the invalid row before upload. |
|
invalid_bin or invalid_issuer_bin |
This means the BIN field contains non-digit characters. BIN values should contain only digits because they are used as issuer-cohort identifiers. |
|
invalid_country |
This means the country field is not a two-letter country code. Use ISO-style two-letter values such as US, CH, ES, or CA. |
|
invalid_integer |
This means a field such as attempt_number or retry_day contains a non-integer value. Replace text, decimals, or blank placeholders with valid integers where the field is provided. |
|
integer_below_minimum |
This means a numeric field is below the allowed minimum. attempt_number must be at least 1, and retry_day must be zero or greater when provided. |
|
invalid_number |
This means amount or authorization_latency_ms contains a value that cannot be parsed as numeric. Use standard numeric formatting without currency symbols. |
|
number_below_minimum |
This means amount or latency is below the allowed minimum. Amount and authorization latency should not be negative. |
|
invalid_timestamp |
This means event_timestamp is not a valid ISO-like timestamp. Use a timestamp such as 2026-05-27T14:10:11+00:00. |
|
invalid_currency |
This means currency is not a three-letter code. Use values such as USD, CHF, CAD, or EUR. |
|
invalid_boolean |
This means recovered or final_success contains a value outside accepted boolean forms. Accepted true values include true, t, 1, yes, and y. Accepted false values include false, f, 0, no, and n. |
|
unrecognized_row_format |
This means the row does not match canonical, legacy, or recognized compatibility formats. Start with the canonical minimal schema of order_id and response_code, then add optional fields gradually. |
Replay-safe ingestion means that uploaded payment records are converted into a stable, reproducible event structure that can support future analysis, investigation, and governance review. The goal is not merely to process the file once. The goal is to preserve enough structure for the same evidence to be interpreted consistently later.
The source implementation supports replay safety through canonical field normalization, required identity fields, row numbering, timestamp validation, explicit data-completeness scoring, and stable response-code conventions. Each of these elements reduces ambiguity and improves the platform’s ability to reconstruct analysis from historical data.
|
Replay-safe element |
Why it matters |
|
Canonical field names |
Canonical names reduce ambiguity. When response_code, bin, country, and card_brand mean the same thing across uploads, the platform can compare results across time and replay windows. |
|
Stable order identity |
order_id gives each record a merchant-side identity anchor. This supports troubleshooting, row-level review, and deterministic reconstruction. |
|
Issuer identity context |
bin, country, bank, and card_brand help Zahlen determine whether behavior belongs to an issuer cohort, a region, a brand, or an incomplete record. |
|
Event timestamp |
event_timestamp supports timeline ordering and historical comparison. Without timing context, the platform has less ability to distinguish old instability from current instability. |
|
Recovery outcome fields |
recovered, final_success, payment_status, and decision_state allow the recovery-rate calculation to determine whether the payment eventually succeeded. |
|
Data completeness score |
The validation layer computes a data-completeness score from key optional fields. This helps operators understand whether weak analysis is caused by poor source data rather than weak issuer signals. |
|
Request source |
The request_source value helps identify whether a row came from canonical upload, legacy compatibility, or another recognized source format. This improves debugging and operational interpretation. |
Before uploading a CSV, the operator or implementation team should confirm that the file uses response_code as the primary response field, includes order_id for stable row identity, preserves issuer identity through bin and country when available, and includes at least one recovery outcome source when recovery-rate analysis is expected.
For production-quality results, the file should include attempt_number, retry_day, event_timestamp, card_brand, recovered, final_success, and payment_status whenever those values are available. These fields convert a basic decline report into useful recovery intelligence.
|
Recommended operating rule Use the smallest valid canonical file only for early testing. For operational use, provide the richest possible canonical context because issuer intelligence depends on identity, timing, recovery outcome, and issuer-cohort context. |
CSV ingestion is the entry point that turns merchant payment records into Zahlen operational evidence. A well-formed file gives the platform enough context to evaluate issuer behavior, compute recovery patterns, generate alerts, support investigation workflows, and preserve replay-safe operational memory.
The most important documentation principle is that response_code is canonical. Compatibility aliases exist to protect older inputs and external exports, but the product language, operator workflow, and future integration guidance should use the canonical response_code convention.