Webhook Configuration
Webhook configuration establishes HTTP callbacks that deliver event notifications between systems in near-real-time. Configure webhooks when you need to trigger actions in response to events occurring in another system, such as receiving notifications when a form submission arrives, a payment completes, or a case status changes. Successful configuration results in reliable event delivery with proper authentication, retry handling, and failure management.
Prerequisites
| Requirement | Detail |
|---|---|
| Network access | Firewall rules permitting inbound HTTPS on port 443 from source IP ranges, or outbound HTTPS to destination endpoints |
| TLS certificate | Valid certificate from public CA for receiving endpoints; self-signed certificates cause delivery failures |
| Authentication credentials | API keys, shared secrets for HMAC, or OAuth client credentials depending on provider requirements |
| Endpoint URL | Publicly accessible HTTPS URL for inbound webhooks; target URL for outbound webhooks |
| Permissions | Administrative access to both sending and receiving systems |
| Logging infrastructure | Centralised logging capable of capturing webhook payloads for debugging |
Verify your endpoint is reachable before beginning configuration:
# Test endpoint accessibility from external networkcurl -I https://api.example.org/webhooks/incoming# Expected: HTTP/2 200 or 401 (authentication required)
# Verify TLS certificate validityecho | openssl s_client -servername api.example.org -connect api.example.org:443 2>/dev/null | openssl x509 -noout -dates# Expected: notAfter date in the futureConfiguring outbound webhooks
Outbound webhooks send event notifications from your system to external endpoints. The sending system initiates HTTP POST requests containing event data whenever specified triggers occur.
- Register the destination endpoint in your application’s webhook configuration. Provide the full URL including protocol and path:
Endpoint URL: https://partner.example.org/api/webhooks/receiveMost platforms require HTTPS endpoints. HTTP endpoints without TLS are rejected by default due to payload exposure risk.
- Select the events that trigger webhook delivery. Limit subscriptions to events the receiving system actually processes:
# Example event subscription configuration webhook: url: "https://partner.example.org/api/webhooks/receive" events: - case.created - case.status_changed - case.closed # Avoid subscribing to high-volume events unless needed # - case.viewed (generates excessive traffic)Each subscribed event type increases delivery volume. A case management system generating 500 cases daily with 4 status changes each produces 2,500 webhook deliveries daily when subscribed to both creation and status events.
- Configure authentication for the outbound request. The receiving system must verify requests originate from your application:
webhook: url: "https://partner.example.org/api/webhooks/receive" authentication: type: hmac_sha256 secret: "${WEBHOOK_SECRET}" # From environment variable header: "X-Signature-256"The HMAC signature is computed over the request body using the shared secret. The receiving system computes the same signature and compares values. Mismatched signatures indicate tampering or misconfiguration.
For systems requiring bearer tokens instead of HMAC:
webhook: authentication: type: bearer token: "${WEBHOOK_BEARER_TOKEN}" header: "Authorization"- Set retry configuration to handle transient delivery failures:
webhook: retry: max_attempts: 5 initial_delay_seconds: 60 backoff_multiplier: 2 max_delay_seconds: 3600This configuration retries failed deliveries at 1, 2, 4, 8, and 16 minutes after the initial failure. The exponential backoff prevents overwhelming recovering systems. After 5 failures spanning approximately 31 minutes, the delivery attempt is abandoned and logged.
- Configure timeout values appropriate for the receiving endpoint’s expected response time:
webhook: timeout: connect_seconds: 10 read_seconds: 30Set read timeout above the receiving system’s processing time. A webhook triggering database operations requiring 15 seconds needs at least 20 seconds read timeout to avoid premature disconnection.
- Enable delivery logging for troubleshooting:
webhook: logging: log_payloads: true log_responses: true retention_days: 30Payload logging captures the full request body sent to the endpoint. Disable payload logging if webhooks contain sensitive personal data subject to retention restrictions.
- Test the configuration by triggering a test event:
# Most platforms provide a test delivery function curl -X POST https://your-system.example.org/api/webhooks/test \ -H "Authorization: Bearer ${ADMIN_TOKEN}" \ -H "Content-Type: application/json" \ -d '{"webhook_id": "wh_abc123", "event_type": "case.created"}'Verify the test delivery appears in both sending system logs and receiving system logs.
Configuring inbound webhooks
Inbound webhooks receive event notifications from external systems. Your application exposes an HTTP endpoint that external systems call when events occur.
- Create the webhook receiver endpoint in your application. The endpoint must accept POST requests and respond within the sender’s timeout window:
# Flask example from flask import Flask, request, jsonify import hmac import hashlib
app = Flask(__name__)
@app.route('/webhooks/receive', methods=['POST']) def receive_webhook(): # Respond quickly - process asynchronously # Return 200 before heavy processing return jsonify({'received': True}), 200Return HTTP 200 immediately upon receiving valid requests. Defer processing to background workers. Senders interpret slow responses as failures and retry, causing duplicate deliveries.
- Implement signature verification to authenticate incoming requests. Extract the signature from the header and compute the expected value:
import hmac import hashlib import os
def verify_signature(payload_body, signature_header): """Verify HMAC-SHA256 signature.""" secret = os.environ['WEBHOOK_SECRET'].encode('utf-8') expected_signature = hmac.new( secret, payload_body, hashlib.sha256 ).hexdigest()
# Use constant-time comparison to prevent timing attacks return hmac.compare_digest( f"sha256={expected_signature}", signature_header )
@app.route('/webhooks/receive', methods=['POST']) def receive_webhook(): signature = request.headers.get('X-Signature-256') if not signature: return jsonify({'error': 'Missing signature'}), 401
if not verify_signature(request.data, signature): return jsonify({'error': 'Invalid signature'}), 401
# Signature valid - queue for processing queue_webhook_processing(request.json) return jsonify({'received': True}), 200The constant-time comparison using hmac.compare_digest() prevents timing attacks where attackers measure response times to deduce valid signature characters.
- Implement idempotency handling to manage duplicate deliveries. Senders retry on timeout or network errors, potentially delivering the same event multiple times:
from functools import wraps import redis
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def idempotent_webhook(f): @wraps(f) def decorated(*args, kwargs): # Extract unique event identifier event_id = request.headers.get('X-Event-ID') if not event_id: event_id = request.json.get('event_id')
if not event_id: # No idempotency key - process anyway with warning app.logger.warning('Webhook received without event ID') return f(*args, kwargs)
# Check if already processed cache_key = f"webhook:processed:{event_id}" if redis_client.get(cache_key): app.logger.info(f'Duplicate webhook ignored: {event_id}') return jsonify({'received': True, 'duplicate': True}), 200
# Mark as processing (with TTL to handle crashes) redis_client.setex(cache_key, 86400, 'processing')
try: result = f(*args, kwargs) redis_client.setex(cache_key, 604800, 'completed') # 7 days return result except Exception as e: redis_client.delete(cache_key) raise
return decorated
@app.route('/webhooks/receive', methods=['POST']) @idempotent_webhook def receive_webhook(): # Processing logic here passThe 7-day retention for processed event IDs exceeds typical retry windows. Senders abandoning retries after 24 hours cannot cause duplicate processing when retrying within that window.
- Configure your web server or load balancer to handle webhook traffic. Webhook endpoints receive bursty traffic when source systems process batches:
# Nginx configuration for webhook endpoint location /webhooks/ { # Increase timeouts for webhook processing proxy_read_timeout 60s; proxy_send_timeout 60s;
# Limit request body size client_max_body_size 1m;
# Rate limiting per source IP limit_req zone=webhooks burst=50 nodelay;
proxy_pass http://webhook_backend; }The burst parameter allows 50 requests to queue before rate limiting applies. Webhook senders delivering batched events need burst capacity to avoid spurious failures.
- Set up a dead letter queue for webhooks that fail processing after receipt:
import json from datetime import datetime
def queue_webhook_processing(payload): try: process_webhook(payload) except Exception as e: # Store failed webhook for manual review store_dead_letter(payload, str(e)) raise
def store_dead_letter(payload, error): dead_letter = { 'payload': payload, 'error': error, 'received_at': datetime.utcnow().isoformat(), 'retry_count': 0 } redis_client.lpush('webhook:dead_letter', json.dumps(dead_letter))
# Alert if dead letter queue grows queue_length = redis_client.llen('webhook:dead_letter') if queue_length > 100: send_alert(f'Webhook dead letter queue has {queue_length} items')The dead letter queue preserves failed webhooks for investigation and manual replay. Without dead letter handling, processing failures result in permanent data loss.
- Expose a health check endpoint for the webhook receiver. Senders often verify endpoint availability before attempting delivery:
@app.route('/webhooks/health', methods=['GET', 'HEAD']) def webhook_health(): # Check dependencies try: redis_client.ping() return jsonify({'status': 'healthy'}), 200 except Exception: return jsonify({'status': 'unhealthy'}), 503Some webhook providers disable integrations after consecutive failed health checks. Ensure the health endpoint reflects actual processing capability.
Payload validation
Webhook payloads require validation beyond signature verification. Malformed payloads cause processing errors; unexpected payloads may indicate API version changes or misconfiguration.
Implement schema validation for incoming payloads:
from jsonschema import validate, ValidationError
WEBHOOK_SCHEMAS = { 'case.created': { 'type': 'object', 'required': ['event_type', 'timestamp', 'data'], 'properties': { 'event_type': {'type': 'string', 'const': 'case.created'}, 'timestamp': {'type': 'string', 'format': 'date-time'}, 'data': { 'type': 'object', 'required': ['case_id', 'created_by'], 'properties': { 'case_id': {'type': 'string', 'pattern': '^case_[a-z0-9]+$'}, 'created_by': {'type': 'string'}, 'priority': {'type': 'string', 'enum': ['low', 'medium', 'high', 'critical']} } } } }}
def validate_webhook_payload(payload): event_type = payload.get('event_type') schema = WEBHOOK_SCHEMAS.get(event_type)
if not schema: raise ValueError(f'Unknown event type: {event_type}')
validate(instance=payload, schema=schema)Schema validation catches payload structure changes before they cause downstream errors. When the sending system adds or removes fields, validation failures provide clear diagnostic information.
Retry and failure handling
Webhook delivery fails for transient reasons (network timeouts, temporary service unavailability) and permanent reasons (invalid endpoint, authentication failure). Configure retry behaviour to handle transient failures without overwhelming failing systems.
+--------------------------------------------------------------------+| WEBHOOK DELIVERY FLOW |+--------------------------------------------------------------------+| || Initial Delivery || | || v || +----+----+ Success (2xx) +------------------+ || | Deliver +----------------------->| Complete | || +----+----+ +------------------+ || | || | Failure (timeout, 5xx, network error) || v || +----+----+ || | Queue | || | Retry | || +----+----+ || | || v || +----+----+ Attempt < Max +------------------+ || | Wait +----------------------->| Retry Delivery +----+ || | Backoff | +------------------+ | || +----+----+ | | || | | Success | || | Attempt >= Max v | || v +-----+------+ | || +----+----+ | Complete | | || | Dead | +------------+ | || | Letter | | || +---------+ +-----+------+ | || | Failure +-------------------+ || +-----+------+ |+--------------------------------------------------------------------+Figure 1: Webhook delivery state transitions with retry and dead letter handling
Configure retry policies based on the failure type:
webhook: retry: # Retry on these status codes retryable_status_codes: - 408 # Request Timeout - 429 # Too Many Requests - 500 # Internal Server Error - 502 # Bad Gateway - 503 # Service Unavailable - 504 # Gateway Timeout
# Do not retry on these (permanent failures) terminal_status_codes: - 400 # Bad Request - payload issue - 401 # Unauthorised - credential issue - 403 # Forbidden - permission issue - 404 # Not Found - endpoint removed - 410 # Gone - endpoint permanently removed
# Retry timing schedule: - delay_seconds: 60 # Attempt 2 at T+1m - delay_seconds: 300 # Attempt 3 at T+6m - delay_seconds: 1800 # Attempt 4 at T+36m - delay_seconds: 7200 # Attempt 5 at T+2h36m - delay_seconds: 21600 # Attempt 6 at T+8h36mThis schedule spaces retry attempts over 8.5 hours, providing time for extended outages to resolve while avoiding excessive retry volume.
For high-volume webhook integrations, implement circuit breaker logic to prevent cascade failures:
from datetime import datetime, timedeltafrom enum import Enum
class CircuitState(Enum): CLOSED = 'closed' # Normal operation OPEN = 'open' # Failing - reject requests HALF_OPEN = 'half_open' # Testing recovery
class WebhookCircuitBreaker: def __init__(self, failure_threshold=5, recovery_timeout=300): self.failure_threshold = failure_threshold self.recovery_timeout = recovery_timeout self.failure_count = 0 self.last_failure_time = None self.state = CircuitState.CLOSED
def record_success(self): self.failure_count = 0 self.state = CircuitState.CLOSED
def record_failure(self): self.failure_count += 1 self.last_failure_time = datetime.utcnow()
if self.failure_count >= self.failure_threshold: self.state = CircuitState.OPEN
def can_attempt(self): if self.state == CircuitState.CLOSED: return True
if self.state == CircuitState.OPEN: # Check if recovery timeout elapsed if datetime.utcnow() - self.last_failure_time > timedelta(seconds=self.recovery_timeout): self.state = CircuitState.HALF_OPEN return True return False
# HALF_OPEN: allow single attempt return True
# Usage per destination endpointcircuit_breakers = {}
def get_circuit_breaker(endpoint_url): if endpoint_url not in circuit_breakers: circuit_breakers[endpoint_url] = WebhookCircuitBreaker() return circuit_breakers[endpoint_url]The circuit breaker prevents sending system resources from being consumed by requests to unavailable endpoints. After 5 consecutive failures, the circuit opens and rejects delivery attempts for 5 minutes before allowing a test request.
Monitoring and logging
Webhook integrations require monitoring for delivery success rate, latency, and error patterns. Instrument both sending and receiving components.
import timefrom prometheus_client import Counter, Histogram
# Outbound webhook metricswebhook_deliveries = Counter( 'webhook_deliveries_total', 'Total webhook delivery attempts', ['endpoint', 'event_type', 'status'])
webhook_latency = Histogram( 'webhook_delivery_seconds', 'Webhook delivery latency', ['endpoint'], buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0])
def deliver_webhook(endpoint, event_type, payload): start_time = time.time() try: response = requests.post(endpoint, json=payload, timeout=30) status = 'success' if response.status_code < 400 else 'client_error' if response.status_code < 500 else 'server_error' webhook_deliveries.labels(endpoint=endpoint, event_type=event_type, status=status).inc() return response except requests.Timeout: webhook_deliveries.labels(endpoint=endpoint, event_type=event_type, status='timeout').inc() raise except requests.ConnectionError: webhook_deliveries.labels(endpoint=endpoint, event_type=event_type, status='connection_error').inc() raise finally: webhook_latency.labels(endpoint=endpoint).observe(time.time() - start_time)Configure alerting thresholds based on expected delivery patterns:
| Metric | Warning threshold | Critical threshold |
|---|---|---|
| Delivery success rate | Below 95% over 15 minutes | Below 80% over 5 minutes |
| Average latency | Above 5 seconds | Above 15 seconds |
| Dead letter queue depth | Above 50 items | Above 200 items |
| Circuit breakers open | Any endpoint open | More than 3 endpoints open |
Log webhook deliveries with sufficient context for debugging:
import loggingimport json
logger = logging.getLogger('webhooks')
def log_webhook_delivery(endpoint, event_type, payload, response=None, error=None): log_entry = { 'endpoint': endpoint, 'event_type': event_type, 'event_id': payload.get('event_id'), 'timestamp': datetime.utcnow().isoformat(), }
if response: log_entry['status_code'] = response.status_code log_entry['response_time_ms'] = response.elapsed.total_seconds() * 1000 # Truncate response body to prevent log bloat log_entry['response_body'] = response.text[:500] if response.text else None
if error: log_entry['error'] = str(error) log_entry['error_type'] = type(error).__name__
# Log payload only in debug mode or on failure if error or (response and response.status_code >= 400): log_entry['payload'] = payload
logger.info(json.dumps(log_entry))Testing webhook integrations
Test webhook configurations before enabling in production. Testing validates endpoint connectivity, authentication, payload handling, and retry behaviour.
- Use a webhook testing service to inspect payloads during development. Services like webhook.site provide temporary endpoints that display received requests:
# Configure webhook to temporary test endpoint curl -X POST https://your-system.example.org/api/webhooks \ -H "Authorization: Bearer ${ADMIN_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "url": "https://webhook.site/unique-id", "events": ["case.created"], "enabled": true }'Trigger a test event and verify the payload appears in the testing service interface with correct headers and body structure.
- Test signature verification by sending requests with invalid signatures:
# Send request with wrong signature curl -X POST https://api.example.org/webhooks/receive \ -H "Content-Type: application/json" \ -H "X-Signature-256: sha256=invalid_signature_here" \ -d '{"event_type": "case.created", "data": {}}'
# Expected response: 401 UnauthorizedConfirm the endpoint rejects requests with missing, malformed, or incorrect signatures.
- Test idempotency by sending duplicate requests:
# Send same event ID twice EVENT_ID="evt_test_$(date +%s)"
curl -X POST https://api.example.org/webhooks/receive \ -H "Content-Type: application/json" \ -H "X-Signature-256: ${VALID_SIGNATURE}" \ -H "X-Event-ID: ${EVENT_ID}" \ -d '{"event_type": "case.created", "event_id": "'${EVENT_ID}'", "data": {}}'
# Send again with same event ID curl -X POST https://api.example.org/webhooks/receive \ -H "Content-Type: application/json" \ -H "X-Signature-256: ${VALID_SIGNATURE}" \ -H "X-Event-ID: ${EVENT_ID}" \ -d '{"event_type": "case.created", "event_id": "'${EVENT_ID}'", "data": {}}'Verify the second request returns success but does not create duplicate records.
- Test timeout handling by configuring an artificially slow endpoint:
# Test endpoint that delays response @app.route('/webhooks/slow', methods=['POST']) def slow_webhook(): time.sleep(45) # Exceeds typical 30-second timeout return jsonify({'received': True}), 200Confirm the sending system times out appropriately and queues for retry.
- Validate retry behaviour by returning error responses:
# Test endpoint that fails initially then succeeds attempt_count = {}
@app.route('/webhooks/flaky', methods=['POST']) def flaky_webhook(): event_id = request.json.get('event_id') attempt_count[event_id] = attempt_count.get(event_id, 0) + 1
if attempt_count[event_id] < 3: return jsonify({'error': 'Temporary failure'}), 503
return jsonify({'received': True}), 200Verify the sending system retries and eventually succeeds on the third attempt.
Verification
After configuration, verify the complete integration path.
Confirm outbound webhook delivery:
# Check recent delivery logscurl -X GET "https://your-system.example.org/api/webhooks/wh_abc123/deliveries?limit=10" \ -H "Authorization: Bearer ${ADMIN_TOKEN}"
# Expected output shows successful deliveries{ "deliveries": [ { "id": "del_xyz789", "event_type": "case.created", "status": "delivered", "response_code": 200, "response_time_ms": 245, "delivered_at": "2024-11-15T14:30:00Z" } ]}Verify inbound webhook processing:
# Check application logs for received webhooksgrep "webhook" /var/log/app/application.log | tail -20
# Expected: entries showing received and processed webhooks{"endpoint":"/webhooks/receive","event_type":"case.created","event_id":"evt_abc123","status_code":200,"response_time_ms":45}Confirm signature verification is enforced:
# Attempt delivery without signaturecurl -X POST https://api.example.org/webhooks/receive \ -H "Content-Type: application/json" \ -d '{"event_type": "test"}'
# Must return 401, not 200Verify dead letter queue is operational:
# Check dead letter queue depthredis-cli LLEN webhook:dead_letter# Expected: 0 if no failures, or count of failed webhooks
# Inspect failed webhook if presentredis-cli LINDEX webhook:dead_letter 0Troubleshooting
| Symptom | Cause | Resolution |
|---|---|---|
| Deliveries fail with “connection refused” | Endpoint not listening, firewall blocking | Verify endpoint is running: curl -I https://endpoint; check firewall rules permit source IPs |
| Deliveries fail with “certificate verify failed” | Invalid, expired, or self-signed TLS certificate | Install valid certificate from public CA; verify with openssl s_client -connect host:443 |
| Signature verification fails on valid requests | Secret mismatch, encoding issue, or signature computed over wrong content | Verify secrets match exactly including whitespace; confirm signature computed over raw body not parsed JSON |
| Duplicate events processed despite idempotency | Event ID not present in payload, Redis unavailable, or TTL too short | Check payload contains event ID; verify Redis connectivity; extend deduplication TTL |
| Webhook endpoint returns 504 Gateway Timeout | Processing takes longer than proxy timeout | Return 200 immediately, process asynchronously; increase proxy timeout if synchronous processing required |
| Retries not occurring after failure | Status code not in retryable list, circuit breaker open | Check retry configuration includes the status code; verify circuit breaker state |
| High latency on webhook delivery | Slow DNS resolution, TLS handshake overhead, or receiver processing time | Cache DNS; use connection pooling; ensure receiver responds before processing |
| Webhook deliveries succeed but events not processed | Receiver returns 200 before processing, then fails silently | Implement dead letter queue; add processing confirmation logs; monitor queue depth |
| Rate limiting errors (429) from receiver | Delivery volume exceeds receiver capacity | Implement sending-side rate limiting; batch events if receiver supports it; increase receiver capacity |
| Payload too large errors (413) | Event data exceeds receiver’s body size limit | Increase client_max_body_size on receiver; reduce payload size by omitting unnecessary fields |
| Events delivered out of order | Concurrent delivery, retry delays | Include sequence number in payload; implement ordering in receiver if required; accept eventual consistency |
| Memory exhaustion in receiver | Unbounded payload buffering, queue growth | Limit in-flight webhooks; stream large payloads; implement backpressure |
| Webhook secret exposed in logs | Logging configuration includes headers or secrets | Redact sensitive headers in logging; rotate compromised secrets immediately |
| Intermittent authentication failures | Secret rotation during delivery, clock skew for time-based auth | Implement grace period accepting old and new secrets during rotation; synchronise clocks with NTP |