Zahlen Documentation
6.3 —
Event Streaming
Phase 6 — API & Integration Documentation
This chapter explains event streaming as the production-grade ingestion and coordination model for high-volume payment intelligence, replay-safe event movement, and federation-aware issuer ecosystem operations.
Event streaming is the integration model used when payment evidence, issuer-health signals, replay records, governance events, and federation coordination events need to move continuously through Zahlen rather than through manual file uploads or individual API requests.
This chapter explains Kafka integration, event envelopes, replay streams, and federation event coordination. Each concept is described as an operational capability, not simply as infrastructure terminology.
The purpose of event streaming is to help Zahlen operate as a durable payment intelligence system. Streaming allows the platform to observe payment behavior, preserve ordered evidence, replay historical windows, coordinate governance state, and support future high-volume issuer network intelligence.
|
Operator Perspective Event streaming turns payment intelligence into a continuous evidence pipeline. Instead of waiting for a CSV upload or a scheduled sync, Zahlen can receive and process events as they happen, while preserving ordering, lineage, replay safety, and operational accountability. |
Event streaming is the practice of publishing operational events into a durable stream so that downstream services can consume, process, store, replay, and coordinate those events over time.
An event is a structured record that something happened. In Zahlen, an event may represent an authorization attempt, a retry attempt, a recovery outcome, an issuer-health snapshot, an alert, an incident update, a governance decision, a replay result, or a federation trust-domain state change.
A stream is an ordered sequence of events. The stream allows events to be processed continuously while preserving enough structure for replay and audit. This makes streaming different from a simple web request. A request may be handled once. A stream can preserve event history for multiple consumers, replay windows, and governance checks.
Event streaming is especially important for Zahlen because the platform is designed around deterministic payment intelligence. Deterministic intelligence requires stable event identity, durable ordering, replayable evidence, and traceable lineage from raw payment behavior to operational conclusion.
|
Concept |
Definition |
Why It Matters |
|
Event |
A structured record that something occurred in the payment or governance lifecycle. |
Events are the raw evidence used to build issuer intelligence. |
|
Stream |
An ordered sequence of events that can be consumed over time. |
Streams preserve operational continuity and support replay. |
|
Producer |
A system that publishes events into the stream. |
Producers feed payment and issuer evidence into Zahlen. |
|
Consumer |
A system or service that reads events from the stream. |
Consumers generate snapshots, alerts, incidents, dashboards, and governance outputs. |
|
Topic |
A named stream category used to organize related events. |
Topics keep payment, issuer, replay, and governance events separated by purpose. |
|
Offset |
A position marker in a stream. |
Offsets allow services to track what has been processed and support replay or recovery. |
Event streaming matters because Zahlen is designed to move beyond static reports into continuous issuer intelligence.
A merchant may process payment events across many countries, issuers, card brands, retry windows, and customer cohorts. A CSV upload can analyze a file. API ingestion can accept structured submissions. Event streaming adds durable, scalable, ordered event movement so that the platform can support high-volume operational monitoring and long-term replay verification.
Streaming also supports separation of responsibilities. One service can publish canonical payment events. Another service can convert those events into issuer-health snapshots. Another can generate alerts. Another can create incidents. Another can run replay verification. Another can evaluate public-safe aggregation thresholds. This separation allows the system to grow without forcing every capability into one synchronous request path.
|
Strategic Interpretation Event streaming is the infrastructure path that allows Zahlen to become a production-grade payment intelligence platform. It supports scale, replay, durability, operational resilience, and eventually federation-aware ecosystem intelligence. |
Kafka integration refers to using Apache Kafka or a Kafka-compatible event bus as the durable streaming backbone for Zahlen events.
Kafka is a distributed event-streaming platform. It allows producers to publish events to topics, consumers to read those events, and the system to preserve event order within partitions. In Zahlen, Kafka is a natural fit for high-volume payment intelligence because it supports durable event movement, replayable history, consumer offsets, and independent processing services.
Kafka integration should be understood as an architectural direction rather than a requirement for first use. Zahlen can begin with CSV ingestion and API ingestion. Kafka becomes important when the platform needs continuous ingestion, high throughput, worker coordination, replay windows, or distributed processing across multiple services.
|
Kafka Concept |
Definition |
Zahlen Interpretation |
|
Topic |
A named stream of related events. |
Zahlen can separate payment events, issuer signals, replay results, governance events, and federation events by topic. |
|
Partition |
A subdivision of a topic used for ordering and scale. |
Partitions allow high-volume processing while preserving order within a key such as issuer or tenant. |
|
Producer |
A system that writes events to Kafka. |
Payment systems, ingestion services, or internal services can publish canonical events. |
|
Consumer |
A service that reads events from Kafka. |
Issuer-health, alerting, incident, replay, and network services can process events independently. |
|
Consumer group |
A set of consumers sharing processing work. |
Consumer groups help Zahlen scale processing without duplicating work. |
|
Offset |
A consumer position in a topic partition. |
Offsets support watermarking, replay, lag tracking, and recovery after failure. |
Kafka topic design is the discipline of organizing event streams by operational purpose.
A topic should represent a stable category of evidence. Payment event topics should not be mixed casually with governance decision topics. Replay topics should preserve replay-specific context. Federation topics should carry trust-domain and coordination semantics. Clear topic design helps operators, engineers, and governance reviewers understand where evidence originates and how it moves through the platform.
|
Recommended Topic Category |
Purpose |
Operational Use |
|
payment.events |
Carries canonical payment, authorization, retry, and recovery events. |
Feeds issuer cognition and recovery observability. |
|
issuer.health.events |
Carries derived issuer-health events and snapshots. |
Feeds monitoring, alerting, dashboards, and investigations. |
|
issuer.alerts |
Carries alert-worthy issuer behavior changes. |
Feeds incident creation, action queues, and supervisor surfaces. |
|
replay.events |
Carries replay inputs, outputs, validation results, and divergence signals. |
Feeds replay verification and governance auditing. |
|
governance.events |
Carries governance confidence, approval, escalation, and policy events. |
Feeds compliance-oriented review and operational accountability. |
|
federation.events |
Carries trust-domain state, quarantine decisions, and cross-domain coordination events. |
Feeds tenant-safe ecosystem intelligence and federation governance. |
Partitioning is the method used to divide a topic into ordered segments that can be processed in parallel. Ordering means preserving the sequence of events within a relevant key.
Ordering is important in Zahlen because payment intelligence often depends on lifecycle sequence. A retry attempt should be understood in relation to the initial failure, later recovery outcome, and downstream issuer-health signal. A replay validation result should be understood after the replay input it evaluated. A quarantine event should be understood in relation to the trust-domain signal that triggered it.
A partition key determines which events are ordered together. Depending on the use case, a partition key may be tenant_id, merchant_id, issuer_bin, issuer cohort identity, event lineage id, or trust_domain_id. The correct key depends on which sequence must remain deterministic.
|
Why Ordering Matters If events are processed out of meaningful order, Zahlen may produce misleading recovery curves, incorrect incident timing, unstable replay outputs, or weak governance lineage. Streaming scale must not destroy operational meaning. |
An event envelope is the standardized wrapper around an event payload. It contains metadata that explains what the event is, where it came from, how it should be processed, and how it can be traced later.
The envelope is different from the payload. The payload contains the business-specific evidence, such as response_code, issuer_bin, retry_day, recovery outcome, or governance decision. The envelope contains the operational metadata needed for routing, replay, auditing, and federation safety.
In Zahlen, event envelopes are important because they preserve consistency across ingestion channels. A payment event received through API ingestion and a payment event consumed from Kafka should both carry enough metadata to support tenant isolation, replay safety, lineage continuity, and downstream interpretation.
|
Envelope Field |
Definition |
Why It Matters |
|
event_id |
A unique identifier for the event. |
Supports idempotency, duplicate detection, replay, and lineage. |
|
event_type |
The category of event being carried. |
Allows consumers to route and interpret the event correctly. |
|
schema_version |
The version of the event structure. |
Protects compatibility as event definitions evolve. |
|
occurred_at |
The source timestamp when the event occurred. |
Supports timeline reconstruction and lifecycle ordering. |
|
published_at |
The timestamp when the event was published to the stream. |
Helps measure lag and stream health. |
|
tenant_id |
The tenant or merchant boundary associated with the event. |
Protects tenant isolation and access control. |
|
correlation_id |
An identifier linking related events in a workflow. |
Supports tracing from payment event to alert, incident, and action. |
|
causation_id |
An identifier showing which prior event caused this event. |
Supports evidence lineage and auditability. |
|
source_system |
The system that produced the event. |
Supports trust assessment and troubleshooting. |
|
trust_domain_id |
The trust-domain boundary associated with the event, when applicable. |
Supports federation governance and quarantine decisions. |
Payload design defines the business content inside the event envelope.
For a payment event, the payload should include issuer identity, payment outcome, response-code context, retry lifecycle context, and recovery result if available. For an issuer-health event, the payload may include ASR, retry recovery rate, decline entropy, issuer stability, fraud pressure, and confidence. For a governance event, the payload may include confidence scoring, evidence reasoning, replay status, approval state, or quarantine status.
Payloads should be expressive enough to support analysis but should not violate tenant isolation. Raw customer data should not be placed into federation or public-safe topics. Sensitive fields should remain inside appropriate tenant boundaries.
|
Payload Type |
Typical Content |
Primary Consumers |
|
Payment event payload |
issuer_bin, issuer_country, card_brand, response_code, retry_day, recovered, authorization_status. |
Issuer cognition, recovery curves, health snapshots. |
|
Issuer-health payload |
ASR, retry recovery rate, entropy, stability, fraud pressure, confidence, evidence counts. |
Monitoring, alerts, dashboards, investigations. |
|
Replay payload |
replay window, input evidence hash, output hash, validation result, divergence reason. |
Replay verification and governance auditing. |
|
Governance payload |
confidence score, recommendation, explanation, approval status, audit marker. |
Supervisor dashboards and governance operations. |
|
Federation payload |
trust-domain state, threshold status, quarantine reason, aggregation eligibility. |
Network intelligence and public-safe aggregation controls. |
Schema versioning is the practice of labeling event structures so that producers and consumers can understand which fields and meanings apply to each event.
Schema versioning matters because event streams are durable. A consumer may read events that were produced weeks, months, or years earlier. If the meaning of a field changes without versioning, historical replay can become unreliable.
Within Zahlen, schema_version should be treated as part of replay safety. If a replay service reconstructs historical issuer behavior, it must know which schema version was used when the event was produced. A field added later should not be assumed to exist in older events. A field whose meaning changed should be handled through explicit compatibility logic.
|
Governance Interpretation Schema versioning prevents historical evidence from being silently reinterpreted. It protects replay safety, auditability, and long-term issuer reputation continuity. |
Replay streams are event streams used to reconstruct, verify, or audit historical event processing.
Replay is central to Zahlen’s governance model because the platform must be able to prove how an operational conclusion was produced. A replay stream may contain historical payment events, reconstructed issuer-health events, validation results, divergence detections, evidence digests, or replay audit records.
A replay stream should be separated from normal live processing when necessary. Live streams drive current operational state. Replay streams reconstruct historical state or validate whether deterministic logic still produces expected results. Separating these functions prevents replay activity from accidentally contaminating live operational intelligence.
|
Replay Stream Concept |
Definition |
Why It Matters |
|
Replay input stream |
The events selected for historical reconstruction. |
Defines what evidence is being replayed. |
|
Replay output stream |
The conclusions produced by replay processing. |
Shows what the system reconstructed from the evidence. |
|
Replay validation event |
An event stating whether replay output matched expectations. |
Supports governance review and deterministic confidence. |
|
Replay divergence event |
An event indicating replay produced an unexpected difference. |
Triggers investigation before relying on the replayed conclusion. |
|
Replay audit event |
A governance record describing replay scope, evidence, and result. |
Supports compliance-oriented accountability. |
A replay window is the historical event range selected for replay. A watermark is a durable progress marker that records how far a stream or processor has advanced.
Replay windows allow operators and governance services to reconstruct a bounded period of evidence. This may include a specific issuer, country, card brand, retry cohort, incident window, or governance epoch. Watermarks help ensure the system can resume processing after failure, avoid reprocessing the same events unintentionally, and verify that processing has advanced as expected.
In Zahlen, watermarks are important because several architectural layers depend on incremental processing. Issuer monitoring, Radar processing, replay verification, governance auditing, and network aggregation may all need to know which events are new, which events were processed, and which events remain pending.
|
Control |
Definition |
Operational Importance |
|
Replay window |
A bounded range of events selected for deterministic reconstruction. |
Prevents replay from becoming ambiguous or unbounded. |
|
Watermark |
A persisted progress marker for stream processing. |
Supports recovery, lag tracking, and incremental processing. |
|
Lag |
The distance between latest available event and latest processed event. |
Indicates whether processing is current or falling behind. |
|
Checkpoint |
A saved processing state used for resume or audit. |
Protects processing continuity after failure. |
|
Evidence digest |
A stable hash or summary of replay evidence. |
Helps verify replay consistency without exposing raw data. |
Replay divergence occurs when replayed event streams produce a different operational conclusion than expected.
In a streaming system, divergence can be caused by missing events, reordered events, schema-version incompatibility, changed evaluation logic, incomplete checkpoints, duplicate processing, or consumer state drift. Because streaming systems are distributed, replay divergence must be treated as an operational signal rather than a simple code defect.
Zahlen should use replay divergence events to notify operators or governance workflows when historical conclusions require review. A replay-divergent issuer degradation finding should not be treated with the same confidence as a replay-consistent finding.
|
Operator Interpretation Replay divergence means the platform may not be reconstructing the same conclusion from the same evidence. Operators should review lineage, schema version, event ordering, and consumer state before using the result for governance or escalation. |
Federation event coordination is the process of using events to synchronize trust-domain state, quarantine decisions, aggregation eligibility, governance approvals, and public-safe intelligence readiness across federation boundaries.
Federation is the architecture that allows multiple domains to contribute to broader ecosystem intelligence without allowing raw tenant data to cross protected boundaries. Event coordination is necessary because trust decisions, quarantine states, replay validation, and aggregation thresholds may change over time.
A federation event should carry enough context to explain what changed, which trust domain was affected, why the change occurred, and whether the signal is eligible to participate in broader network intelligence.
|
Federation Event |
Definition |
Operational Meaning |
|
trust_domain_registered |
A trust domain became known to the federation layer. |
The domain can now be evaluated for eligibility and governance state. |
|
trust_domain_health_changed |
A trust domain changed health or integrity status. |
Operators may need to review whether signals remain trustworthy. |
|
federation_quarantine_applied |
A domain or signal was isolated from broader intelligence use. |
Prevents unsafe evidence from contaminating network intelligence. |
|
federation_quarantine_released |
A quarantined domain or signal was restored after review. |
Allows signals to re-enter permitted workflows. |
|
aggregation_threshold_met |
A cohort signal satisfied minimum crowd or evidence thresholds. |
The signal may become eligible for broader or public-safe interpretation. |
|
public_safe_signal_published |
A signal was approved for public-safe exposure. |
The output passed aggregation, privacy, and governance controls. |
A trust domain is a governed boundary that defines where evidence comes from, how it is trusted, and whether it can participate in cross-domain intelligence.
In event streaming, trust-domain identity should travel with events that may influence federation or network intelligence. This does not mean raw data crosses domains. It means events carry governance-safe metadata that helps the platform determine whether the signal is eligible, quarantined, trusted, or restricted.
Trust-domain metadata helps prevent unsafe mixing of production and replay evidence, tenant-private and public-safe evidence, validated and unvalidated signals, or healthy and quarantined domains.
|
Governance Requirement Federation event streams must preserve tenant isolation. Raw merchant, customer, and payment data should not be placed into cross-domain topics. Only aggregated, anonymized, threshold-compliant issuer signals should be eligible for broader federation use. |
Event durability is the ability of the platform to preserve event evidence reliably over time.
Durability matters because Zahlen uses events as evidence. If event history is lost, replay safety weakens. If offsets are lost, processors may duplicate or skip work. If event payloads are corrupted, issuer intelligence may become unreliable. If governance events disappear, auditability suffers.
In a Kafka-based architecture, durability is supported by topic retention, replication, partitioning, offset management, backups, and monitoring. In Zahlen’s operational model, durability should also include evidence digests, replay audit records, event lineage, and governance health checks.
|
Durability Control |
Definition |
Why It Matters |
|
Replication |
Events are copied across brokers or storage nodes. |
Protects against infrastructure failure. |
|
Retention |
Events are kept for a defined period or policy. |
Supports replay and historical analysis. |
|
Offset storage |
Consumer progress is saved durably. |
Allows processing to resume after interruption. |
|
Evidence digest |
A stable hash or summary of important evidence. |
Supports tamper detection and replay verification. |
|
Durability audit |
A periodic check that event history and processing state remain intact. |
Supports operational survivability and governance confidence. |
Streaming observability is the ability to monitor whether event streams, producers, consumers, topics, offsets, watermarks, and downstream services are healthy.
Streaming observability matters because event streaming introduces operational dependencies. A payment event may be published successfully but not consumed. A consumer may consume events but fail to generate issuer-health snapshots. A replay processor may fall behind. A federation topic may accumulate quarantined signals. Without observability, operators cannot tell whether intelligence is current.
|
Observability Signal |
Definition |
Operator Interpretation |
|
Producer success rate |
The rate at which producers publish events successfully. |
Low success may indicate ingestion or connectivity issues. |
|
Consumer lag |
How far a consumer is behind the latest event. |
High lag may indicate processing pressure or failure. |
|
Watermark age |
How old the latest processed watermark is. |
Old watermarks may mean processing is stale. |
|
Dead-letter count |
Number of events routed to error handling. |
High counts may indicate schema drift or invalid payloads. |
|
Replay validation rate |
The rate at which replay outputs validate successfully. |
Low validation weakens governance confidence. |
|
Quarantine count |
Number of events or domains isolated from normal use. |
High counts may indicate trust or evidence-quality problems. |
A dead-letter queue is a stream or storage location used for events that cannot be processed successfully by normal consumers.
Dead-letter handling is important because events should not simply disappear when processing fails. An invalid event, schema mismatch, unexpected payload, or transient processing failure may still contain important evidence. Routing failed events to a dead-letter path allows operators and engineers to inspect the cause, correct the integration, and decide whether the event can be replayed or reprocessed.
In Zahlen, dead-letter events should preserve the original event envelope, validation error, processing service, failure timestamp, and recommended remediation. This turns a failure into an auditable operational artifact.
|
Operator Interpretation A dead-letter event is not just a technical error. It is a signal that some evidence could not enter normal intelligence processing and may require schema, integration, or governance review. |
Tenant isolation is the rule that raw tenant, merchant, customer, and payment-level data must remain inside the correct protected boundary.
Event streaming must enforce tenant isolation because streams can carry high volumes of sensitive operational evidence. A streaming system that does not preserve tenant boundaries can create serious privacy, security, and governance risks.
For Zahlen, tenant identity should be explicit in private operational topics, and cross-tenant or public-safe topics should carry only properly aggregated and anonymized issuer signals. A raw authorization event should not be published into a public or federation-wide topic. A threshold-compliant aggregated issuer health signal may be eligible for broader use if governance controls approve it.
|
Boundary |
Allowed Event Scope |
Restriction |
|
Tenant-private stream |
Raw merchant and payment events for a specific tenant. |
Must not be consumed by other tenants or public-safe services without transformation. |
|
Internal operational stream |
Derived internal events used by Zahlen services. |
Must preserve access control and lineage. |
|
Federation stream |
Aggregated trust-domain or network coordination events. |
Must not include raw tenant-private evidence. |
|
Public-safe stream |
Approved ecosystem signals for external or public visibility. |
Must satisfy thresholds, anonymization, and governance review. |
Governance integrity is the ability to preserve explainable, auditable, deterministic reasoning across operational workflows.
Event streaming supports governance integrity by preserving evidence flow. When events are enveloped, versioned, ordered, and durably retained, the platform can reconstruct how a payment event became an issuer signal, how an issuer signal became an alert, how an alert became an incident, and how an incident produced an operator recommendation.
This flow is especially important in enterprise environments. Supervisors and compliance reviewers may need to know why the platform recommended investigation, why a signal was quarantined, why a replay result diverged, or why a public-safe signal was published.
|
Compliance Interpretation Event streaming gives Zahlen a durable audit spine. When designed correctly, the stream is not only an ingestion mechanism. It is a governance record of how operational intelligence was produced. |
The recommended implementation path should be incremental. Zahlen should not move every workflow to streaming at once. The safest path is to preserve current CSV and API ingestion while introducing streaming around well-defined event envelopes, durable topics, replay processing, and operational health visibility.
|
Implementation Step |
Purpose |
Operator Evidence |
|
Define canonical envelopes |
Standardize event metadata for routing, replay, and lineage. |
Events carry event_id, type, schema version, timestamps, tenant context, and correlation fields. |
|
Introduce payment event topics |
Move canonical payment and retry events into durable streams. |
Issuer-health services can consume consistent payment evidence. |
|
Add issuer-health event topics |
Publish derived issuer-health events and alerts. |
Monitoring, dashboards, and investigations can subscribe to derived signals. |
|
Add replay streams |
Separate replay input, output, validation, and divergence events. |
Replay verification becomes operationally visible. |
|
Add federation coordination topics |
Publish trust-domain and quarantine coordination events. |
Network intelligence can remain tenant-safe and governance-aware. |
|
Add streaming health dashboards |
Expose lag, watermarks, dead-letter counts, and replay validation rates. |
Operators can see whether the streaming spine is healthy. |
Event streaming troubleshooting is the process of diagnosing failures in event publication, consumption, ordering, replay, schema compatibility, and downstream processing.
Operators should interpret streaming issues through an evidence-quality lens. A producer outage means events may be missing. Consumer lag means intelligence may be stale. Schema mismatch means events may not be interpretable. Replay divergence means historical conclusions may not reconstruct. Quarantine spikes may indicate trust-domain or aggregation issues.
|
Symptom |
Likely Cause |
Recommended Fix |
|
Producer failures increase |
Source system cannot publish events reliably. |
Check credentials, network connectivity, topic availability, and producer error logs. |
|
Consumer lag grows |
Downstream processors cannot keep up. |
Review consumer health, worker capacity, partition assignment, and processing errors. |
|
Dead-letter events increase |
Events fail validation or schema parsing. |
Review schema versions, required fields, and canonical mappings. |
|
Replay divergence appears |
Replay output differs from expected results. |
Review event ordering, missing events, schema compatibility, and evaluation logic changes. |
|
Watermark stops advancing |
Processor progress is stalled. |
Check worker status, offsets, database writes, and downstream service errors. |
|
Federation quarantine spikes |
Signals are failing trust-domain, threshold, or policy checks. |
Review aggregation thresholds, replay safety, tenant isolation, and trust-domain integrity. |
Event streaming is the production-grade ingestion and coordination model for Zahlen. It allows payment events, issuer-health signals, replay records, governance decisions, and federation coordination events to move continuously through the platform.
Kafka integration provides the durable event backbone. Event envelopes preserve metadata for routing, replay, lineage, and auditability. Replay streams allow historical reconstruction and validation. Federation event coordination allows trust-domain state, quarantine decisions, aggregation eligibility, and public-safe intelligence readiness to be managed safely.
When implemented correctly, event streaming gives Zahlen the infrastructure foundation for high-volume issuer cognition, replay-safe governance, operational survivability, and tenant-safe ecosystem intelligence.