API Integration Patterns
API integration patterns provide proven architectural solutions for connecting systems that exchange data or trigger actions across boundaries. These patterns address fundamental challenges in distributed systems: handling network unreliability, managing data consistency across systems, and designing for graceful degradation when components fail. Each pattern represents a trade-off between complexity, consistency, latency, and resilience that organisations must evaluate against their specific requirements.
Problem context
System integration becomes necessary when organisations operate multiple applications that must share data or coordinate actions. A case management system must synchronise beneficiary records with a payment platform. A mobile data collection application must transmit submissions to a central database. A donor management system must trigger notifications in a communications platform when contributions arrive.
These integration requirements share common challenges regardless of the specific systems involved. Networks fail unpredictably, introducing latency spikes, connection drops, and packet loss. Systems operate at different speeds, creating mismatches between producers that generate data quickly and consumers that process it slowly. Data formats differ between systems, requiring transformation logic. Authentication and authorisation must span trust boundaries. Failures in one system should not cascade into failures across the entire integration.
Integration patterns apply when two or more systems must exchange information and at least one system exposes a programmatic interface. The patterns do not apply when systems can share a database directly (though this approach carries its own risks), when manual data entry suffices for the volume and frequency of exchanges, or when a commercial integration platform already provides the specific connector needed.
Solution
Integration patterns divide into three families based on the communication model: request-response patterns where a caller waits for an immediate reply, event-driven patterns where producers emit events without waiting for specific consumers, and synchronisation patterns where systems maintain consistent copies of shared data.
Request-response patterns
Request-response integration follows a simple model: a client sends a request to a server, waits for processing, and receives a response. This pattern suits operations requiring immediate confirmation, such as validating a beneficiary’s identity against a registry or submitting a payment instruction that must succeed or fail atomically.
REST (Representational State Transfer) organises APIs around resources identified by URLs. Each resource supports standard operations mapped to HTTP methods: GET retrieves a resource, POST creates a new resource, PUT replaces a resource entirely, PATCH modifies specific fields, and DELETE removes a resource. REST APIs communicate state through representations, typically JSON documents, with hypermedia links enabling clients to discover related resources and available actions.
+------------------+ +------------------+| | GET /beneficiaries | || Client +--------------------->+ API Server || Application | /12345 | || | | || | 200 OK | || |<---------------------+ || | {"id": "12345", | || | "name": "..."} | |+------------------+ +------------------+Figure 1: REST request-response for resource retrieval
A well-designed REST API uses HTTP status codes consistently: 200 for successful retrieval, 201 for successful creation with Location header indicating the new resource, 204 for successful deletion with no content returned, 400 for client errors in request format, 401 for authentication failures, 403 for authorisation failures, 404 for resources that do not exist, 409 for conflicts with current state, and 429 for rate limit violations.
GraphQL provides an alternative query language that allows clients to specify exactly which fields they need in a single request. Rather than multiple REST calls to assemble related data, a GraphQL query can traverse relationships and return nested structures. This approach reduces network round trips, particularly valuable over high-latency connections, but shifts complexity to the server which must resolve arbitrary query shapes efficiently.
query BeneficiaryWithCases { beneficiary(id: "12345") { name registrationDate cases(status: OPEN) { id type assignedWorker { name email } } }}This single GraphQL query replaces what would require three REST calls: one to fetch the beneficiary, one to fetch their open cases, and one to fetch worker details for each case. The trade-off is server complexity and the need for careful query cost analysis to prevent expensive operations.
Event-driven patterns
Event-driven integration decouples producers from consumers by introducing an intermediary that stores and routes events. Producers emit events describing what happened without knowledge of which systems will consume them. Consumers subscribe to event types they care about and process events asynchronously.
Publish-subscribe distributes events from one producer to multiple consumers through topics or channels. When a new beneficiary registers, the registration system publishes an event to a “beneficiary.registered” topic. The case management system, payment system, and reporting warehouse each subscribe to this topic and receive a copy of every event independently.
+------------------+ +------------------+ +------------------+| | | | | || Registration +---->+ Message Broker +---->+ Case Management || System | | | | System || | | beneficiary. | | |+------------------+ | registered | +------------------+ | topic | | +---->+------------------+ | | | | | | | Payment | | | | Platform | +------------------+ +------------------+ | | v +------------------+ | | | Reporting | | Warehouse | +------------------+Figure 2: Publish-subscribe distributing events to multiple consumers
Event payloads should contain sufficient information for consumers to act without additional lookups. A beneficiary.registered event should include the beneficiary’s core attributes (identifier, name, registration date, programme) rather than just an identifier that forces consumers to call back to the registration system. This approach, called event-carried state transfer, reduces coupling and improves resilience when the source system becomes unavailable.
Event streaming extends publish-subscribe with persistent, ordered logs that consumers can replay. Apache Kafka and similar platforms store events durably for a configurable retention period (commonly 7 to 30 days), assign each event an offset within its partition, and allow consumers to read from any offset. A new consumer can start from the beginning of the log to build initial state, then continue processing new events as they arrive.
Streaming suits use cases requiring event replay for recovery, audit, or late-joining consumers. A reporting system brought online six months after the case management system can replay the complete history of case events to reconstruct current state. This capability enables event sourcing architectures where the event log itself becomes the authoritative data source.
Data synchronisation patterns
Synchronisation patterns maintain consistent data copies across systems that cannot share a single database. These patterns address the challenge of keeping distributed data aligned while handling conflicts, failures, and systems operating at different speeds.
Full synchronisation extracts complete datasets from the source, transforms them as needed, and loads them into the target, replacing previous data entirely. This brute-force approach guarantees consistency at the cost of efficiency and timeliness. A full sync of 100,000 beneficiary records takes substantially longer than synchronising only the 50 records that changed since the last run.
Full sync remains appropriate when the dataset is small (under 10,000 records synchronising in under 5 minutes), when identifying changes reliably is impossible because the source lacks modification timestamps, or when accumulated errors from incremental sync require periodic correction. Many organisations run daily full syncs overnight as a safety net even when incremental sync handles routine updates.
Delta synchronisation transfers only records that changed since the previous sync, identified through modification timestamps, sequence numbers, or explicit change flags. This approach requires the source system to track when records change, typically through an updated_at timestamp that the database updates automatically on every write.
+------------------------------------------------------------------+| DELTA SYNC PROCESS |+------------------------------------------------------------------+| || Source System Target System || +----------------+ +----------------+ || | beneficiaries | | beneficiaries | || | | | | || | id | updated | SELECT WHERE | id | updated | || |----|---------- | updated_at > |----|---------- | || | 1 | 10:05 | last_sync_time | 1 | 09:00 | || | 2 | 09:00 | | 2 | 09:00 | || | 3 | 10:15 +----------------->+| 3 | 10:15 | || | 4 | 08:30 | Records 1 & 3 | 4 | 08:30 | || +----------------+ transferred +----------------+ || || last_sync_time: 09:30 new last_sync_time: 10:15 || |+------------------------------------------------------------------+Figure 3: Delta synchronisation using modification timestamps
Delta sync introduces edge cases that full sync avoids. Clock skew between servers can cause records to appear unchanged when they have actually been modified. Transactions in progress when the sync runs may commit after the high-water mark is recorded, leaving those changes unsynchronised until the next run. Deletions require special handling since deleted records no longer exist to compare; soft deletes with a deleted_at timestamp solve this but require target systems to honour the deletion flag.
Change data capture (CDC) reads database transaction logs to identify changes rather than polling tables with timestamp queries. When a source database commits a transaction, the change appears in its write-ahead log or binary log, which CDC tools monitor continuously. This approach captures every change including deletions, maintains exact ordering, and imposes minimal overhead on the source database.
CDC requires access to database internals unavailable in SaaS platforms and introduces operational complexity around log retention and connector maintenance. Tools like Debezium (open source) provide CDC connectors for PostgreSQL, MySQL, SQL Server, and MongoDB that emit changes to Kafka topics. The resulting event stream enables both real-time replication and historical replay.
Offline-capable patterns
Field operations frequently involve intermittent connectivity where devices must function without network access for hours or days, then reconcile accumulated changes when connectivity returns. Offline-capable patterns address this constraint through local storage and asynchronous transmission.
Store-and-forward queues outbound data locally when the network is unavailable, then transmits queued items when connectivity resumes. A mobile data collection application stores completed forms in device storage, attempts transmission periodically, and confirms successful delivery before removing local copies. This pattern requires persistent local storage, transmission retry logic, and idempotent receivers that handle duplicate submissions gracefully.
+------------------------------------------------------------------+| STORE-AND-FORWARD FLOW |+------------------------------------------------------------------+| || Mobile Device || +------------------------------------+ || | | || | +-------------+ +------------+ | +-----------------+ || | | Application +--->| Local | | | | || | | | | Queue +-+--X--+ Server API | || | +-------------+ | | | | (offline) | || | | - form_001 | | +-----------------+ || | | - form_002 | | || | | - form_003 | | || | +------+-----+ | || +------------------------------------+ || | || - - - - - - - - - - - - - - -|- - - - - - - - - - - - - - - - - || Connectivity restored | || - - - - - - - - - - - - - - -|- - - - - - - - - - - - - - - - - || | || +----------------------------v-------+ +-----------------+ || | | | | || | +-------------+ +------------+ | | | || | | Application | | Local +-+---->+ Server API | || | | | | Queue | | | (online) | || | +-------------+ | | | | | || | | (draining) | | | ACK: form_001 | || | +------------+ |<----+ ACK: form_002 | || +------------------------------------+ | ACK: form_003 | || +-----------------+ |+------------------------------------------------------------------+Figure 4: Store-and-forward queuing during offline periods
Conflict resolution becomes necessary when multiple devices modify the same record while disconnected. Two field workers might update the same beneficiary’s phone number independently. When both devices synchronise, the server must decide which change wins or how to merge them.
Common resolution strategies include last-write-wins (latest timestamp prevails), which risks data loss but requires no human intervention; first-write-wins (first synchronised version prevails), which preserves earlier work but may lose more recent corrections; and manual resolution, which flags conflicts for human review but requires staff capacity. The appropriate strategy depends on data sensitivity and organisational tolerance for automated decisions.
Implementation
Error handling and retry
Network failures, timeouts, and transient server errors occur frequently in distributed systems. Robust integrations distinguish transient failures (worth retrying) from permanent failures (requiring different handling) and implement retry strategies that balance recovery with resource consumption.
Exponential backoff spaces retry attempts with increasing delays to avoid overwhelming recovering systems. A sequence of 1 second, 2 seconds, 4 seconds, 8 seconds, and 16 seconds between attempts allows transient congestion to clear while limiting total wait time. Adding jitter (random variation up to 50% of the delay) prevents thundering herd problems where many clients retry simultaneously after an outage.
import randomimport time
def retry_with_backoff(operation, max_attempts=5, base_delay=1.0): """ Retry an operation with exponential backoff and jitter.
Args: operation: Callable that raises on failure max_attempts: Maximum retry attempts (default 5) base_delay: Initial delay in seconds (default 1.0)
Returns: Result of successful operation
Raises: Last exception if all attempts fail """ for attempt in range(max_attempts): try: return operation() except TransientError as e: if attempt == max_attempts - 1: raise
delay = base_delay * (2 attempt) jitter = delay * random.uniform(0.5, 1.0) time.sleep(jitter)
raise RuntimeError("Unreachable")Transient errors that warrant retry include HTTP 429 (rate limited), 502 (bad gateway), 503 (service unavailable), 504 (gateway timeout), and connection errors. Permanent errors that should not be retried include HTTP 400 (bad request), 401 (authentication failed), 403 (forbidden), and 404 (not found), since repeating the same request will produce the same result.
Circuit breaker prevents cascading failures by stopping requests to a failing service temporarily. The circuit maintains three states: closed (requests flow normally), open (requests fail immediately without attempting the call), and half-open (a single test request determines whether to restore normal flow).
+------------------------------------------------------------------+| CIRCUIT BREAKER STATES |+------------------------------------------------------------------+| || failure_count < threshold || +------+ || | | || v | || +------------------+------+-------------------+ || | | || | CLOSED | || | (requests flow through) | || | | || +---------------------+-----------------------+ || | || | failure_count >= threshold || | (e.g., 5 failures in 60 seconds) || v || +---------------------+-----------------------+ || | | || | OPEN | || | (requests fail immediately) | || | | || +---------------------+-----------------------+ || | || | timeout expires || | (e.g., 30 seconds) || v || +---------------------+-----------------------+ || | | || | HALF-OPEN | || | (test single request) | || | | || +--------+----------------------------+-------+ || | | || | test succeeds | test fails || v v || +-----+------+ +-----+------+ || | CLOSED | | OPEN | || +------------+ +------------+ || |+------------------------------------------------------------------+Figure 5: Circuit breaker state transitions
A threshold of 5 failures within 60 seconds provides reasonable sensitivity without triggering on isolated errors. The open timeout of 30 seconds allows backend systems time to recover before test traffic arrives. These values require tuning based on observed failure patterns and recovery times.
Idempotency
An operation is idempotent when executing it multiple times produces the same result as executing it once. Idempotency enables safe retries: if a network timeout occurs after the server processed a request but before the client received the response, retrying the request should not create duplicate records or apply the same change twice.
Achieving idempotency requires a mechanism to detect duplicate requests. Idempotency keys are unique identifiers that clients generate and include with each request. The server records which keys it has processed and returns the original response for repeated keys without re-executing the operation.
First request: POST /payments Idempotency-Key: pay_7f3a2c91 {"amount": 50000, "beneficiary": "12345"}
Server: Creates payment, stores key, returns 201 Created
Retry (network lost first response): POST /payments Idempotency-Key: pay_7f3a2c91 {"amount": 50000, "beneficiary": "12345"}
Server: Finds key in store, returns original 201 Created without creating second paymentIdempotency keys should be UUIDs or similarly unique values generated client-side. Keys typically expire after 24 hours to bound storage requirements. Servers must store the complete response alongside the key since clients expect identical responses on replay.
Authentication
Integration authentication differs from user authentication in requiring non-interactive credential exchange. Three approaches dominate: API keys for simple integrations, OAuth 2.0 client credentials for machine-to-machine flows, and service accounts for cloud platform integrations.
API keys provide a simple bearer token that clients include in request headers. The server validates the key against a registry and applies associated permissions. API keys suit internal integrations and low-risk external integrations but lack expiration, rotation support, and fine-grained scoping in most implementations.
GET /api/v2/beneficiaries HTTP/1.1Host: api.example.orgAuthorization: Bearer sk_live_abc123xyz789OAuth 2.0 client credentials grant enables applications to obtain short-lived access tokens using a client ID and secret. The client authenticates to an authorisation server, receives a token valid for a limited duration (commonly 1 hour), and includes this token in API requests. Token expiration limits exposure from credential compromise, and scopes restrict what operations the token authorises.
+------------------+ +------------------+| | POST /oauth/token | || Client App +--------------------->+ Auth Server || | grant_type=client_ | || | credentials | || | client_id=... | || | client_secret=... | || | | || | {"access_token": | || |<---------------------+ "eyJ...", || | "expires_in": | || | 3600} | |+--------+---------+ +------------------+ | | GET /api/beneficiaries | Authorization: Bearer eyJ... v+--------+---------+| || Resource Server || |+------------------+Figure 6: OAuth 2.0 client credentials flow for machine-to-machine authentication
Service accounts in cloud platforms (Google Cloud, Azure, AWS) provide identity for applications without human users. The application assumes the service account identity and receives temporary credentials scoped to specific resources. This approach integrates with cloud provider IAM systems and avoids managing separate credential stores.
Rate limiting
APIs enforce rate limits to protect server resources and ensure fair access across clients. Integrations must respect these limits or face request rejection and potential account suspension.
Rate limits express as requests per time window: 100 requests per minute, 1000 requests per hour, or 10,000 requests per day. Some APIs use token bucket algorithms that allow short bursts above the sustained rate. Responses include headers indicating current consumption and reset timing:
HTTP/1.1 200 OKX-RateLimit-Limit: 100X-RateLimit-Remaining: 23X-RateLimit-Reset: 1699574400When rate limited (HTTP 429), clients should read the Retry-After header indicating how long to wait before retrying. Implementing client-side rate limiting that tracks consumption and pauses requests before hitting limits avoids wasted requests and improves throughput.
import timeimport threading
class RateLimiter: """ Client-side rate limiter using token bucket algorithm. """
def __init__(self, requests_per_second=10.0): self.rate = requests_per_second self.tokens = requests_per_second self.last_update = time.monotonic() self.lock = threading.Lock()
def acquire(self): """Block until a request token is available.""" with self.lock: now = time.monotonic() elapsed = now - self.last_update self.tokens = min(self.rate, self.tokens + elapsed * self.rate) self.last_update = now
if self.tokens < 1.0: sleep_time = (1.0 - self.tokens) / self.rate time.sleep(sleep_time) self.tokens = 0.0 else: self.tokens -= 1.0Data transformation
Systems rarely share identical data models, necessitating transformation between source and target formats. Transformation logic should handle missing fields gracefully, validate data types before conversion, and preserve information that might be needed downstream even if the immediate target does not require it.
Mapping documents specify how source fields become target fields:
# Beneficiary mapping: Registration system -> Case managementmappings: - source: reg_id target: external_id transform: string
- source: full_name target: name transform: string
- source: dob target: date_of_birth transform: date format: "%Y-%m-%d"
- source: mobile target: phone_numbers[0].value transform: phone default_country: "+254"
- source: reg_date target: registration.date transform: datetime
- source: programme_code target: registration.programme transform: lookup lookup_table: programme_codesTransformation pipelines should log rejected records with reasons to enable investigation without blocking valid records. A batch of 1000 records with 3 transformation failures should process 997 records successfully while capturing details about the 3 failures for review.
Consequences
Adopting API integration patterns introduces trade-offs that organisations must weigh against their specific constraints.
Request-response patterns provide immediate feedback and simple mental models but create tight coupling between systems. When the target system becomes unavailable, the source system must handle failures, potentially blocking user operations. Timeout configuration requires balancing responsiveness (shorter timeouts) against accommodation of slow operations (longer timeouts). A 30-second timeout suits most synchronous operations; batch operations may require several minutes.
Event-driven patterns reduce coupling and improve resilience since producers continue operating when consumers fail. The broker buffers events until consumers recover. However, eventual consistency means data viewed immediately after an event may not reflect the latest state. Debugging distributed flows proves harder than tracing synchronous calls. Message ordering guarantees vary by implementation and configuration, requiring careful attention when order matters.
Synchronisation patterns maintain data copies but divergence remains possible during network partitions or system failures. Full sync wastes bandwidth and processing time on unchanged data. Delta sync requires reliable change tracking and careful handling of edge cases. CDC provides the most accurate change capture but demands database-level access and operational expertise.
All patterns increase system complexity compared to monolithic designs. Each integration point represents a potential failure mode, security surface, and operational burden. Organisations with limited IT capacity should prefer fewer, simpler integrations over comprehensive but complex architectures.
Variants
Minimal integration
For organisations with single-person IT departments or limited development capacity, integration should use managed services and avoid custom code where possible.
Webhook-triggered automation through platforms like Zapier, Make (formerly Integromat), or Power Automate connects applications without code. These platforms handle authentication, transformation, and retry logic through visual interfaces. Costs scale with usage volume but remain manageable for low-throughput integrations (under 1000 operations per month).
Native application connectors provide pre-built integrations between common platforms. Salesforce to Mailchimp, Kobo to Google Sheets, or Xero to receipt scanning services require only configuration, not development. These connectors limit flexibility but dramatically reduce implementation and maintenance effort.
Standard integration
Organisations with development capacity but constrained resources implement integrations using established frameworks and patterns without building infrastructure from scratch.
API gateway services (Kong, Tyk, or cloud provider offerings) handle authentication, rate limiting, logging, and transformation at the edge, allowing backend services to focus on business logic. Open source options provide cost control; managed services reduce operational burden.
Message broker deployment using RabbitMQ or Redis Streams provides publish-subscribe and queueing capabilities with moderate operational overhead. These systems suit organisations processing thousands to hundreds of thousands of messages daily without the operational complexity of Kafka.
Enterprise integration
Organisations with dedicated integration teams and high-volume requirements deploy full event streaming infrastructure.
Kafka deployment with schema registry enables event-driven architecture at scale, supporting millions of events daily with durability, replay, and exactly-once semantics. This infrastructure requires Kubernetes or dedicated cluster management and ongoing operational investment.
Event mesh architectures span multiple clusters, regions, or cloud providers, routing events based on content and subscriber location. This pattern suits multinational organisations with federated IT structures and strict data residency requirements.
Anti-patterns
Point-to-point spaghetti connects each system directly to every other system it needs data from, creating n*(n-1) integration points for n systems. Adding a new system requires integrating with all existing systems. Removing a system requires finding and updating all dependent integrations. Centralised integration through an API gateway, message broker, or integration platform reduces connections to n.
Synchronous chains occur when Service A calls Service B, which calls Service C, which calls Service D, creating a fragile dependency chain where any failure cascades upstream. Latency accumulates across all calls. Breaking chains with asynchronous messaging or caching intermediate results improves resilience.
Chatty interfaces make many small requests where fewer large requests would suffice. Retrieving a beneficiary, then their cases, then each case’s notes individually creates unnecessary network overhead. Batch endpoints or GraphQL queries that return complete data structures reduce round trips.
Ignoring back-pressure sends data faster than receivers can process, eventually overwhelming queues, exhausting memory, or triggering rate limits. Senders should monitor receiver acknowledgements and slow transmission when queues grow. Pull-based consumption where receivers request data when ready inherently respects capacity.
Tight transformation coupling embeds source-specific field names and formats throughout consuming applications. When the source system changes field names, all consumers require updates. Introducing a canonical data model with transformation at integration boundaries isolates consumers from source changes.
Assuming network reliability writes integration code without error handling, retries, or timeout configuration. The network always fails eventually. Every network call requires explicit handling for connection errors, timeouts, and unexpected response codes.
Leaking credentials stores API keys, client secrets, or service account keys in source code, configuration files committed to version control, or environment variables visible in logs. Credentials belong in secrets management systems with rotation capability and audit logging. See Secure Coding Standards for credential handling requirements.