Skip to main content

Real-Time and Streaming Data

Real-time and streaming data systems process information continuously as it arrives rather than accumulating records for periodic batch processing. These architectures treat data as an unbounded sequence of events flowing through the system, enabling immediate visibility into operations, rapid response to changing conditions, and feedback loops that close within seconds rather than hours. For mission-driven organisations, streaming capabilities support use cases ranging from real-time beneficiary feedback during distributions to early warning systems that detect emerging crises before they escalate.

The fundamental distinction between batch and streaming lies in when processing occurs relative to data arrival. Batch systems collect data over a period, then process the accumulated set as a unit. Streaming systems process each record as it arrives, maintaining continuous computation over the data flow. This difference in timing propagates through every aspect of system design: storage structures, processing patterns, failure handling, and resource allocation all differ substantially between the two paradigms.

Event
An immutable record of something that happened at a specific point in time. Events capture facts about the world: a beneficiary registered, a distribution occurred, a sensor reading was taken, a user clicked a button. Events are never updated; new events supersede old ones.
Stream
An unbounded, ordered sequence of events. Unlike a batch dataset with defined boundaries, a stream has no end. Processing must handle data continuously without waiting for a complete dataset.
Event streaming platform
Infrastructure that captures, stores, and distributes events between producers and consumers. Provides durable, ordered, replayable event logs that multiple applications can read independently.
Stream processing
Computation performed continuously over event streams, producing results incrementally as data arrives. Includes filtering, transformation, aggregation, and pattern detection.

Event streaming fundamentals

Event streaming platforms provide the infrastructure for capturing, storing, and distributing events across systems. The core abstraction is the topic, a named, ordered, append-only log of events that multiple applications can write to and read from independently. Topics provide durable storage that retains events for configurable periods, enabling consumers to read at their own pace and replay historical data when needed.

Producers are applications that write events to topics. A mobile data collection app submitting survey responses, a web application logging user interactions, and a sensor transmitting readings all act as producers. Each producer appends events to the end of the topic log, and the platform assigns each event a sequential offset number that uniquely identifies its position. Producers need not know which applications will consume their events; they publish to topics, and consumers subscribe independently.

Consumers are applications that read events from topics. A consumer maintains a position in the log, reading events sequentially from its current offset. Multiple consumers can read the same topic independently, each tracking its own position. A real-time dashboard, an analytics pipeline, and an alerting system can all consume the same event stream without interfering with each other. When a consumer processes an event and advances its offset, the platform records this progress, enabling the consumer to resume from its last position after restarts.

+-------------------------------------------------------------------+
| EVENT STREAMING PLATFORM |
+-------------------------------------------------------------------+
| |
| PRODUCERS TOPICS |
| +---------------+ |
| | Mobile App +--+ +---------------------------+ |
| | (surveys) | | | survey-responses | |
| +---------------+ +--------->| [0][1][2][3][4][5]... | |
| | +---------------------------+ |
| +---------------+ | | |
| | Web App +--+ | |
| | (feedback) | +-----------+-----------+ |
| +---------------+ | | | |
| v v v |
| +---------------+ +----------+ +----------+ +----------+ |
| | IoT Sensors +----->| alerts | | Consumer | | Consumer | |
| | (readings) | | topic | | Group A | | Group B | |
| +---------------+ +----------+ | offset:3 | | offset:5 | |
| +----------+ +----------+ |
| | | |
| CONSUMERS v v |
| +---------------+ +-----------+ +-----------+ |
| | Dashboard |<----------------| Analytics | | Alerting | |
| | (real-time) | | Pipeline | | System | |
| +---------------+ +-----------+ +-----------+ |
| |
+-------------------------------------------------------------------+

Figure 1: Event streaming platform architecture showing producers, topics, and independent consumers

Partitions divide topics into parallel segments for scalability. Each partition is an independent, ordered log that can reside on a different server. When a producer writes an event, a partition key determines which partition receives it. Events with the same key always go to the same partition, preserving their relative order. A topic tracking beneficiary interactions might use beneficiary ID as the partition key, ensuring all events for a given beneficiary remain ordered while distributing load across partitions.

The partition count determines the maximum parallelism for consuming a topic. With 12 partitions, up to 12 consumer instances can process the topic simultaneously, each reading from a subset of partitions. Consumer groups coordinate this parallel consumption: consumers in the same group divide partitions among themselves, while consumers in different groups each receive all events independently. A topic with 12 partitions consumed by a group of 4 instances assigns 3 partitions to each instance. If an instance fails, the group rebalances, redistributing its partitions to surviving members.

Retention policies control how long the platform stores events. Time-based retention deletes events older than a specified duration; a 7-day retention means consumers can replay the past week’s data but not earlier. Size-based retention deletes oldest events when the topic exceeds a storage threshold. Compacted topics retain only the most recent event for each key, providing a snapshot of current state rather than full history. A topic tracking beneficiary registration status might use compaction to maintain current status while discarding intermediate updates.

Stream processing patterns

Stream processing transforms event streams through operations that execute continuously as data arrives. The simplest operations process each event independently: filtering discards events that fail to match criteria, mapping transforms event structure or content, and enrichment adds data from external sources. A stream of survey submissions might filter to include only complete responses, map to extract relevant fields, and enrich with geographic data based on location codes.

Aggregation computes summary statistics over groups of events. Unlike batch aggregation that processes a complete dataset, stream aggregation must produce results incrementally as events arrive. A running count of registrations by location updates with each new registration event, providing current totals without waiting for a batch window to complete. Aggregations maintain state between events, storing intermediate results that each new event updates.

Windowing bounds aggregations to finite time intervals, enabling computations like “registrations per hour” or “average wait time over the past 15 minutes”. Windows impose structure on unbounded streams, grouping events for aggregate computation.

+-------------------------------------------------------------------+
| WINDOWING PATTERNS |
+-------------------------------------------------------------------+
| |
| TUMBLING WINDOWS (fixed, non-overlapping) |
| |
| Time: --0----1----2----3----4----5----6----7----8----9--> |
| [ Window 1 ][ Window 2 ][ Window 3 ] |
| events: a,b,c events: d,e events: f,g,h |
| |
| Each event belongs to exactly one window |
| |
+-------------------------------------------------------------------+
| |
| SLIDING WINDOWS (fixed size, continuous movement) |
| |
| Time: --0----1----2----3----4----5----6----7----8----9--> |
| [ Window at t=3 ] |
| [ Window at t=4 ] |
| [ Window at t=5 ] |
| |
| Windows overlap; events appear in multiple windows |
| |
+-------------------------------------------------------------------+
| |
| SESSION WINDOWS (activity-based, variable size) |
| |
| Time: --0----1----2----3----4----5----6----7----8----9--> |
| Events: a b c d e f |
| [Session 1: a,b,c] [Session 2: d,e,f] |
| gap > threshold |
| |
| Windows defined by activity gaps, not fixed intervals |
| |
+-------------------------------------------------------------------+

Figure 2: Window types for stream aggregation showing tumbling, sliding, and session patterns

Tumbling windows divide time into fixed, non-overlapping intervals. A 1-hour tumbling window groups events by the hour they occurred, producing one aggregate result per hour. Each event belongs to exactly one window. Tumbling windows suit metrics like hourly registration counts or daily distribution totals where distinct time periods require separate calculations.

Sliding windows maintain a fixed-size window that moves continuously with time. A 1-hour sliding window with 5-minute slide produces a new result every 5 minutes, each covering the preceding hour. Events appear in multiple overlapping windows. Sliding windows enable continuous monitoring of recent activity without waiting for period boundaries.

Session windows group events by activity rather than fixed time intervals. A session window with a 30-minute gap timeout groups all events from a user session together, closing the window after 30 minutes of inactivity. Session windows capture behavioural units where the duration varies based on user engagement patterns.

Joins combine events from multiple streams based on matching criteria. A stream of distribution events joined with a stream of beneficiary updates produces enriched distribution records containing current beneficiary information. Stream joins present temporal challenges absent in batch processing: when joining events from two streams, how long should the system wait for a matching event to arrive? Join windows define this waiting period, and events without matches within the window produce either null results or no output depending on join semantics.

Delivery semantics and ordering

Stream processing systems must handle failures while maintaining correct results. Three delivery guarantee levels define the trade-offs between performance, complexity, and correctness.

At-most-once delivery processes each event zero or one times. The system acknowledges receipt before processing completes; if processing fails after acknowledgement, the event is lost. This approach provides the highest throughput and lowest latency but tolerates data loss. Sensor telemetry where occasional missing readings are acceptable might use at-most-once delivery.

At-least-once delivery processes each event one or more times. The system acknowledges only after processing completes; if failure occurs before acknowledgement, the event is reprocessed. This approach prevents data loss but may produce duplicates. A counter incrementing on each event will overcount if events replay after failure. At-least-once suits operations that are naturally idempotent or where duplicates are less harmful than losses.

Exactly-once delivery processes each event precisely once, even across failures. Achieving this guarantee requires coordination between the streaming platform and processing logic, typically through transactional writes that atomically commit both processing results and consumer offset advances. Exactly-once provides correct results but adds latency and complexity. Financial calculations, deduplication, and operations where both loss and duplication cause problems require exactly-once semantics.

Ordering guarantees specify when events maintain their relative sequence. Most platforms guarantee ordering within a partition but not across partitions. Events for the same key (routed to the same partition) arrive in the order produced, but events across different keys may interleave arbitrarily. Applications requiring global ordering must use a single partition, sacrificing parallelism. Applications requiring ordering only within an entity (all events for one beneficiary) can use entity ID as the partition key, achieving both ordering and parallelism.

Event time versus processing time creates additional ordering complexity. Event time is when the event occurred in the real world; processing time is when the system processes it. Network delays, system outages, and batch uploads cause events to arrive out of event-time order. A mobile app submitting queued events after connectivity restoration produces events with event times hours or days before their processing times. Windowing by event time (usually desired for correct results) must handle late arrivals; windowing by processing time (simpler but less meaningful) reflects system behaviour rather than real-world timing.

Watermarks track progress through event time, indicating that no events with earlier timestamps should arrive. A watermark at 14:30 signals that all events before 14:30 have been processed, allowing windows ending before 14:30 to close and emit results. Watermarks balance latency against completeness: aggressive watermarks close windows quickly but may exclude late events; conservative watermarks wait longer, including more late arrivals but delaying results.

State management

Stream processing operations that span multiple events require state: aggregations accumulate counts and sums, joins buffer events awaiting matches, and pattern detection tracks partial matches across event sequences. Managing this state introduces challenges absent in stateless processing.

Local state stores processing state on the machine running the computation. Each processing instance maintains its own state for the partitions it handles. Local state provides fast access but creates recovery challenges: if an instance fails, its state is lost. Stream processing frameworks address this through checkpointing, periodically saving state snapshots to durable storage. Upon restart, the instance restores from the latest checkpoint and replays events since that checkpoint to rebuild current state.

Checkpointing frequency balances recovery time against overhead. Frequent checkpoints reduce replay time after failure but consume resources during normal operation. A 1-minute checkpoint interval means recovery replays at most 1 minute of events; a 10-minute interval reduces checkpoint overhead but extends recovery time. For processing 10,000 events per second, a 10-minute checkpoint interval requires replaying up to 6 million events on recovery.

State backends determine where local state resides. In-memory backends provide fastest access but limit state size to available RAM. Disk-backed backends (such as RocksDB) support larger state by storing data on local SSD with memory caching for hot data. Distributed state backends store state in external systems like Redis, adding network latency but enabling state sharing and simpler scaling.

Rescaling changes the number of processing instances, requiring state redistribution. When scaling from 4 to 8 instances, each original instance’s state must split across two new instances. Key-based partitioning enables this: state for each key migrates to whichever instance now handles that key’s partition. The streaming framework coordinates this migration during rebalancing, pausing processing briefly while state transfers complete.

Architecture patterns

Two architectural patterns dominate the design of systems combining batch and stream processing: Lambda and Kappa architectures. Each addresses the challenge of providing both historical analysis and real-time views while managing the complexity inherent in streaming systems.

+------------------------------------------------------------------+
| LAMBDA ARCHITECTURE |
+------------------------------------------------------------------+
| |
| +------------------+ |
| | Data Source | |
| +--------+---------+ |
| | |
| +--------------+--------------+ |
| | | |
| v v |
| +-----------+----------+ +-----------+----------+ |
| | BATCH LAYER | | SPEED LAYER | |
| | | | | |
| | +------------------+ | | +------------------+ | |
| | | Raw Data Store | | | | Stream Processor | | |
| | | (data lake) | | | | (real-time) | | |
| | +--------+---------+ | | +--------+---------+ | |
| | | | | | | |
| | v | | v | |
| | +------------------+ | | +------------------+ | |
| | | Batch Processing | | | | Real-time Views | | |
| | | (periodic) | | | | (incremental) | | |
| | +--------+---------+ | | +--------+---------+ | |
| | | | | | | |
| +----------+-----------+ +----------+-----------+ |
| | | |
| v v |
| +----------+-----------+ +----------+-----------+ |
| | Batch Views | | Real-time Views | |
| | (complete, delayed) | | (approximate, fast) | |
| +----------+-----------+ +----------+-----------+ |
| | | |
| +-------------+--------------+ |
| | |
| v |
| +--------+--------+ |
| | SERVING LAYER | |
| | Merge batch + | |
| | real-time | |
| +-----------------+ |
| |
+------------------------------------------------------------------+
+------------------------------------------------------------------+
| KAPPA ARCHITECTURE |
+------------------------------------------------------------------+
| |
| +------------------+ |
| | Data Source | |
| +--------+---------+ |
| | |
| v |
| +--------------+--------------+ |
| | EVENT LOG (immutable) | |
| | Retains full history | |
| +--------------+--------------+ |
| | |
| v |
| +--------------+--------------+ |
| | STREAM PROCESSING | |
| | | |
| | Same logic for: | |
| | - Real-time processing | |
| | - Historical reprocessing | |
| | | |
| +--------------+--------------+ |
| | |
| v |
| +--------------+--------------+ |
| | SERVING LAYER | |
| | Single source of truth | |
| +--------------+--------------+ |
| |
+------------------------------------------------------------------+

Figure 3: Lambda architecture with dual processing paths compared to Kappa architecture with unified stream processing

Lambda architecture maintains parallel batch and streaming paths. The batch layer stores all raw data and periodically recomputes complete, accurate results. The speed layer processes real-time data to provide approximate, immediate results. The serving layer merges batch and speed layer outputs, using batch results as the authoritative baseline and speed layer results to fill the gap since the last batch run. When a new batch computation completes, it replaces the speed layer’s approximations for that period.

Lambda architecture provides accurate historical results through batch processing while enabling real-time visibility through stream processing. The trade-off is operational complexity: two separate codebases must produce compatible results, and the merging logic in the serving layer adds another component to maintain. Organisations adopt Lambda when batch processing provides capabilities (complex joins, machine learning training) difficult to replicate in streaming, or when existing batch infrastructure requires preservation.

Kappa architecture eliminates the batch layer, using stream processing for both real-time and historical analysis. The event log retains full history, enabling reprocessing of past data through the same streaming logic used for real-time processing. When business logic changes or errors require correction, a new stream processor reads from the log’s beginning, rebuilding results from scratch. The new processor runs alongside the existing one until it catches up, then takes over serving queries.

Kappa architecture reduces operational complexity through a single processing path but requires stream processing capable of handling the full workload, including historical reprocessing. The event log must retain sufficient history for reprocessing needs, potentially requiring substantial storage. Organisations adopt Kappa when stream processing frameworks can handle all required computations and when the operational simplicity of a single codebase outweighs Lambda’s flexibility.

Most mission-driven organisations with streaming needs begin with Kappa-style architectures, adding batch processing only when specific requirements exceed streaming capabilities. A real-time feedback system that also produces monthly reports can process both through streaming, aggregating continuous updates for real-time dashboards while writing to storage that batch reporting queries later consume.

Event-driven architecture

Event-driven architecture structures applications around the production, detection, and consumption of events rather than request-response interactions. Components communicate by publishing events that other components consume, enabling loose coupling where producers need not know their consumers. This pattern extends beyond stream processing infrastructure to application design itself.

In an event-driven system, a beneficiary registration triggers an event that multiple components consume independently. The case management system creates a case record, the notification service sends a confirmation message, the analytics pipeline updates dashboards, and the sync service queues data for field office replication. Adding a new consumer requires no changes to the registration service; the new component subscribes to the existing event stream.

Command Query Responsibility Segregation (CQRS) separates read and write operations into distinct models. Commands modify state and produce events; queries read from views optimised for specific access patterns. A case management system might accept commands through a write model that validates and persists changes, emitting events that update multiple query-optimised views: one for case workers showing their active cases, another for supervisors showing team workloads, another for reports aggregating outcomes.

Event sourcing stores state as a sequence of events rather than current values. Instead of storing a beneficiary record with current address, the system stores events: BeneficiaryRegistered, AddressUpdated, AddressUpdated. Current state derives by replaying events in order. Event sourcing provides complete audit history, enables temporal queries (what was the address on a given date), and supports rebuilding state with modified logic. The trade-off is complexity: reading current state requires aggregating events, and the event log grows indefinitely.

Real-time analytics

Real-time analytics provides immediate visibility into operational data through continuously updated metrics, dashboards, and alerts. Unlike batch analytics that reflects hours-old or days-old data, real-time analytics shows current state and emerging trends as they develop.

+------------------------------------------------------------------+
| REAL-TIME ANALYTICS PIPELINE |
+------------------------------------------------------------------+
| |
| DATA SOURCES |
| +------------+ +------------+ +------------+ +------------+ |
| | Mobile | | Web | | Partner | | IoT | |
| | Collection | | Feedback | | APIs | | Sensors | |
| +-----+------+ +-----+------+ +-----+------+ +-----+------+ |
| | | | | |
| +-------+-------+-------+-------+ | |
| | | |
| v v |
| +-------------+---------------------------------------+-----+ |
| | EVENT STREAMING PLATFORM (Kafka/Pulsar) | |
| | - Ingestion topics (raw events) | |
| | - Processed topics (enriched, aggregated) | |
| +-------------+---------------------------------------------+ |
| | |
| +--------+--------+ |
| | | |
| v v |
| +----+-------+ +-----+--------+ |
| | Stream | | Stream | |
| | Processing | | Processing | |
| | (Flink) | | (ksqlDB) | |
| | | | | |
| | - Filter | | - SQL over | |
| | - Enrich | | streams | |
| | - Aggregate| | - Joins | |
| +----+-------+ +-----+--------+ |
| | | |
| +--------+--------+ |
| | |
| +--------+--------+--------+ |
| | | | |
| v v v |
| +----+----+ +--------+--+ +--+----------+ |
| | Time- | | Key-Value | | Search | |
| | Series | | Store | | Index | |
| | DB | | (Redis) | | (Elastic) | |
| +----+----+ +--------+--+ +--+----------+ |
| | | | |
| +--------+--------+--------+ |
| | |
| v |
| +-------------+----------------------+ |
| | VISUALISATION LAYER | |
| | +-------+ +--------+ +-------+ | |
| | |Grafana| |Superset| |Custom | | |
| | | | | | |Dash | | |
| | +-------+ +--------+ +-------+ | |
| +------------------------------------+ |
| |
+-------------------------------------------------------------------+

Figure 4: Real-time analytics pipeline from data sources through processing to visualisation

Metrics computation aggregates events into quantitative measures updated in real time. Registration counts, distribution volumes, response times, and error rates all derive from aggregating event streams. Stream processing computes these aggregations continuously, writing results to time-series databases optimised for metric storage and retrieval. Grafana, connected to Prometheus or InfluxDB, renders live dashboards showing current values and recent trends.

Alerting triggers notifications when metrics cross thresholds or patterns indicate problems. A real-time pipeline detecting registration velocity dropping below expected levels can alert field coordinators within minutes rather than waiting for daily reports to surface the issue. Effective alerting requires careful threshold calibration: too sensitive produces alert fatigue; too conservative misses genuine issues.

Real-time search indexes events for immediate querying. Elasticsearch or similar search engines consume event streams, enabling operators to search recent activity within seconds of occurrence. A beneficiary service point can immediately check whether a person registered earlier that day, even before batch systems synchronise.

Dashboards for humanitarian response benefit particularly from real-time visibility. During active distributions, live monitoring reveals queue lengths, processing rates, and emerging bottlenecks. Feedback systems that surface beneficiary concerns in real time enable rapid response to distribution problems. Early warning systems that aggregate sensor data, mobile reports, and external feeds can detect deteriorating conditions and trigger alerts before situations escalate.

Technology options

Event streaming platforms and stream processing frameworks form the infrastructure for real-time data systems. Selection depends on processing requirements, operational capacity, and existing technology investments.

Event streaming platforms

Apache Kafka dominates event streaming infrastructure with mature tooling, extensive ecosystem, and proven scalability. Kafka stores events in distributed, replicated logs with configurable retention, providing durable messaging that supports replay and multiple independent consumers. Kafka Streams enables stream processing within Kafka’s ecosystem using a library-based approach that deploys with application code rather than requiring a separate cluster. ksqlDB adds SQL semantics for stream processing accessible to analysts without programming expertise. Kafka requires operational investment: cluster management, monitoring, and capacity planning demand dedicated attention.

Self-hosted Kafka suits organisations with Kubernetes expertise and requirements for data sovereignty. A 3-broker cluster on modest hardware handles thousands of events per second with multi-day retention. Managed Kafka offerings from Confluent Cloud, AWS MSK, and Aiven reduce operational burden at increased cost, with pricing based on data throughput and retention.

Apache Pulsar provides similar capabilities with architectural differences that simplify some operational aspects. Pulsar separates storage (BookKeeper) from message serving, enabling independent scaling of compute and storage tiers. Pulsar’s tiered storage automatically moves older data to object storage, reducing hot storage costs for high-retention use cases. Pulsar Functions offer lightweight stream processing embedded in the platform.

Redpanda offers Kafka API compatibility with a single-binary implementation in C++ rather than Java, claiming lower latency and simpler operations. Organisations invested in Kafka tooling but seeking reduced operational complexity may find Redpanda appealing.

Stream processing frameworks

Apache Flink provides the most comprehensive stream processing capabilities, including exactly-once semantics, sophisticated windowing, complex event processing, and state management. Flink processes both bounded (batch) and unbounded (stream) data through the same API, supporting Kappa architectures where the same code handles historical and real-time processing. Flink requires a cluster deployment with job managers and task managers, representing significant operational investment.

Apache Spark Structured Streaming extends Spark’s batch processing model to streaming through micro-batch processing. Rather than true continuous processing, Structured Streaming processes in small batches (100ms minimum), providing near-real-time results with Spark’s familiar DataFrame API. Organisations already using Spark for batch analytics can extend existing skills and infrastructure to streaming use cases.

ksqlDB processes Kafka streams using SQL syntax, making stream processing accessible without programming. Queries like SELECT location, COUNT(*) FROM registrations WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY location create continuously updating results. ksqlDB suits simpler processing requirements and organisations prioritising accessibility over advanced capabilities.

Managed services

Cloud-managed streaming services reduce operational burden for organisations without dedicated platform teams.

AWS Kinesis provides integrated streaming services: Kinesis Data Streams for event ingestion, Kinesis Data Firehose for delivery to storage, and Kinesis Data Analytics for stream processing using SQL or Flink. Tight integration with AWS services simplifies architecture for AWS-committed organisations.

Azure Event Hubs offers Kafka-compatible APIs alongside native protocols, enabling Kafka applications to migrate with minimal changes. Azure Stream Analytics provides SQL-based stream processing.

Google Cloud Dataflow implements the Apache Beam model, supporting both batch and stream processing with automatic scaling. Beam’s portability layer enables code to run on multiple execution engines.

For mission-driven organisations, managed services reduce operational demands at the cost of vendor dependency and potential data sovereignty concerns. A Kinesis-based pipeline processing beneficiary data stores that data in AWS infrastructure subject to US jurisdiction, regardless of configured region.

Implementation considerations

Assessing real-time requirements

Not all data requires real-time processing. Real-time architecture adds complexity that batch processing avoids: state management, exactly-once semantics, operational monitoring, and specialised skills all increase system cost. Before adopting streaming, organisations should identify use cases where the latency reduction provides genuine value.

Strong candidates for real-time processing include operational dashboards where decisions depend on current data, alerting systems where delay reduces response effectiveness, feedback loops where immediate visibility enables rapid correction, and integration scenarios where downstream systems require continuous updates. Distribution monitoring during active operations, beneficiary feedback during service delivery, and early warning systems for emerging crises all benefit from real-time visibility.

Weak candidates include historical reporting where next-day data suffices, analytics where aggregation obscures real-time variations, and systems where consumers cannot act on real-time information. Monthly donor reports, annual outcome analyses, and strategic planning dashboards rarely require real-time data.

Starting with limited capacity

Organisations with constrained IT capacity can adopt streaming incrementally. A minimal viable streaming architecture uses managed services to reduce operational demands:

A Kafka-compatible managed service (Confluent Cloud free tier, Aiven free tier, or Redpanda Serverless) provides event streaming without cluster management. Producer applications write events using standard Kafka clients; consumer applications read and process independently. ksqlDB or Kafka Streams handle processing without separate infrastructure. A time-series database (InfluxDB Cloud free tier) stores aggregated metrics, and Grafana Cloud visualises dashboards.

This architecture handles moderate event volumes (tens of thousands per day) with minimal operational investment. As requirements grow, organisations can add stream processing capacity, migrate to self-hosted infrastructure for cost or sovereignty reasons, or expand to more sophisticated processing patterns.

Field deployment constraints

Real-time systems in field contexts face connectivity and resource constraints. Field sites with intermittent connectivity cannot maintain continuous streams to centralised platforms. Instead, field systems buffer events locally during disconnection, forwarding accumulated events when connectivity resumes. This store-and-forward pattern provides local real-time visibility while eventual synchronisation populates central dashboards.

Edge processing performs time-sensitive analytics locally rather than depending on centralised infrastructure. A field distribution site runs local stream processing to monitor queue status and alert on anomalies, forwarding summarised metrics to headquarters rather than raw events. This approach reduces bandwidth requirements and enables local real-time visibility independent of connectivity.

Cost considerations

Streaming infrastructure incurs costs that scale with data volume and processing complexity differently than batch systems.

Event streaming platform costs derive from storage retention, throughput capacity, and cluster resources. A Kafka cluster retaining 7 days of 100GB daily ingestion stores 700GB; extending to 30-day retention quadruples storage costs. Managed services typically charge per GB ingested plus storage, with costs increasing linearly with volume.

Stream processing costs depend on computational resources deployed continuously rather than intermittently. A Flink cluster processing streams 24/7 consumes resources constantly, unlike batch jobs that spin up periodically. Exactly-once processing requires additional resources for transactional coordination and checkpointing.

For cost-effective streaming, organisations should limit retention to genuine requirements, use tiered storage moving older data to cheaper storage, right-size processing resources based on actual load, and consider batch processing for analyses that tolerate latency.

See also