Skip to main content

Testing Strategy

Testing strategy determines which types of tests an organisation runs, at what proportions, and under what conditions. The strategy shapes how development teams allocate effort between fast isolated tests and slow comprehensive tests, between automated verification and manual exploration, between testing everything and testing what matters. A well-designed testing strategy produces confidence proportional to risk while remaining executable within available resources.

Unit test
A test that verifies a single function, method, or class in isolation from external dependencies. Unit tests execute in milliseconds and require no network, database, or filesystem access.
Integration test
A test that verifies interactions between components, including database queries, API calls, and message processing. Integration tests require running infrastructure and execute in seconds.
End-to-end test
A test that verifies complete user workflows through the full application stack, including browser rendering, authentication, and external service interactions. End-to-end tests execute in minutes.
Contract test
A test that verifies an API producer and consumer agree on request and response formats without requiring both systems to run simultaneously.
Test coverage
The proportion of code executed during test runs, measured as percentage of lines, branches, or statements. Coverage indicates what code tests exercise, not whether tests verify correct behaviour.
Test fixture
Predefined data and system state required before a test executes. Fixtures include database records, file contents, and mock service responses.
Flaky test
A test that produces inconsistent results across identical runs due to timing dependencies, shared state, or environmental factors.

Test architecture models

Two models dominate testing strategy discussions: the test pyramid and the test trophy. Each model prescribes different proportions of test types based on different assumptions about where defects originate and what feedback loops matter most.

Test pyramid

The test pyramid model, introduced by Mike Cohn, places unit tests at the base, integration tests in the middle, and end-to-end tests at the apex. The shape indicates quantity: many unit tests, fewer integration tests, fewest end-to-end tests.

+-------+
| E2E |
| (5%) |
+---+---+
/ \
/ \
+----+---------+----+
| Integration |
| (15%) |
+--------+----------+
/ \
/ \
+-----------+----------------+-----------+
| Unit |
| (80%) |
+----------------------------------------+

Figure 1: Test pyramid with recommended proportions

The pyramid optimises for fast feedback. Unit tests run in milliseconds, providing immediate notification when code changes break existing behaviour. A codebase with 500 unit tests completes its test suite in under 10 seconds. The same verification through end-to-end tests would require 30 minutes or more.

The pyramid assumes most defects arise from logic errors within individual functions. Code that correctly handles edge cases, validates inputs, and computes outputs will compose into correct systems. Integration and end-to-end tests then verify that correctly-implemented components connect properly.

This model works well for applications with complex business logic implemented in pure functions: financial calculations, data transformations, algorithmic processing. Each function’s behaviour can be fully specified through input-output pairs.

Test trophy

The test trophy model, developed by Kent C. Dodds, places integration tests at the widest point rather than unit tests. Static analysis forms the base, followed by unit tests, then integration tests at maximum width, then end-to-end tests at the top.

+-------+
| E2E |
| (5%) |
+---+---+
/ \
/ \
+----------+---------+----------+
| Integration |
| (50%) |
+---------------+---------------+
/ \
/ \
+----+-----+----+
| Unit |
| (20%) |
+-------+-------+
/ \
/ \
+--------+-----+--------+
| Static Analysis |
| (25%) |
+------------------------+

Figure 2: Test trophy with recommended proportions

The trophy optimises for confidence per test. Integration tests verify that components work together correctly, catching defects that unit tests miss: incorrect database queries, malformed API requests, broken event handlers. A single integration test that creates a user, logs them in, and verifies their dashboard loads provides more confidence than twenty unit tests of individual functions.

The trophy assumes most defects arise from component interactions rather than isolated logic errors. Modern applications compose libraries, frameworks, and services; individual functions contain minimal logic. A React component that renders data from an API involves framework behaviour, HTTP handling, and state management that unit tests cannot verify in isolation.

This model works well for applications dominated by integration code: web applications, mobile apps, API services. Testing the login flow through an actual browser catches CSS issues, JavaScript errors, and API failures that unit tests of individual functions would miss.

Model selection

Neither model applies universally. The appropriate model depends on where an application’s complexity resides.

Applications with complex algorithms benefit from pyramid-style heavy unit testing. A payment processing system that calculates fees, applies discounts, handles currency conversion, and validates transaction limits contains significant isolated logic. Unit tests verify each calculation independently; integration tests confirm the components compose correctly.

Applications with complex integrations benefit from trophy-style heavy integration testing. A case management system that stores data in PostgreSQL, authenticates through OAuth, sends notifications via email and SMS, and generates PDF reports contains minimal isolated logic. Integration tests verify that records persist correctly, authentication flows complete, and notifications deliver.

Most applications contain both algorithmic complexity and integration complexity in different components. A monitoring dashboard might process time-series data through complex aggregation functions (unit-test heavily) while rendering charts through a visualisation library (integration-test heavily). The testing strategy applies different models to different components based on where defects are likely to emerge.

Test type selection

Each test type serves a specific verification purpose. Selecting the appropriate type for each verification goal prevents both gaps (behaviours that no test verifies) and redundancy (behaviours that multiple tests verify identically).

Unit tests

Unit tests verify that individual functions produce correct outputs for given inputs. A function that calculates grant expenditure rates takes a budget amount and expenditure amount, returning a percentage. Unit tests verify the calculation for normal values, boundary values, and error conditions:

def test_expenditure_rate_normal():
assert calculate_expenditure_rate(budget=100000, spent=45000) == 45.0
def test_expenditure_rate_zero_budget():
assert calculate_expenditure_rate(budget=0, spent=0) == 0.0
def test_expenditure_rate_overspend():
assert calculate_expenditure_rate(budget=100000, spent=120000) == 120.0
def test_expenditure_rate_negative_raises():
with pytest.raises(ValueError):
calculate_expenditure_rate(budget=-1000, spent=500)

Unit tests require no external dependencies. The function under test receives all inputs as parameters and produces outputs as return values. Dependencies on databases, APIs, or filesystems are replaced with test doubles: mocks that record calls, stubs that return predetermined values, or fakes that provide simplified implementations.

Unit tests execute in milliseconds because they perform no I/O. A test suite of 1,000 unit tests completes in 2-5 seconds. This speed enables running the full suite on every file save, providing immediate feedback during development.

The limitation of unit tests is their isolation. A function might produce correct outputs in isolation while failing when integrated with actual dependencies. The unit test for a database query function might verify that the function constructs correct SQL, but the test cannot verify that the SQL executes correctly against a real database schema.

Integration tests

Integration tests verify that components interact correctly with their actual dependencies. A repository class that queries PostgreSQL is tested against a real PostgreSQL database. An API client that calls an external service is tested against the actual API or a contract-verified mock.

@pytest.fixture
def database():
# Create test database with schema
engine = create_engine("postgresql://test:test@localhost/test_db")
Base.metadata.create_all(engine)
yield Session(engine)
Base.metadata.drop_all(engine)
def test_create_beneficiary(database):
repo = BeneficiaryRepository(database)
beneficiary = repo.create(
name="Test Beneficiary",
registration_date=date(2024, 1, 15),
household_size=4
)
retrieved = repo.get_by_id(beneficiary.id)
assert retrieved.name == "Test Beneficiary"
assert retrieved.household_size == 4

Integration tests require running infrastructure: databases, message queues, cache servers. Test frameworks manage this infrastructure through fixtures that start containers before tests and stop them after. A PostgreSQL fixture starts a container, applies database migrations, and provides a connection to tests. After tests complete, the fixture destroys the container and all data.

Integration tests execute in seconds because they perform actual I/O operations. A test suite of 200 integration tests might require 2-3 minutes. This duration is acceptable for pre-merge verification but too slow for continuous feedback during development.

Integration tests catch defects that unit tests miss: incorrect SQL syntax, missing database columns, malformed API requests, incompatible message formats. A function that constructs a SQL query might pass unit tests while failing integration tests because the column names do not match the actual schema.

End-to-end tests

End-to-end tests verify complete user workflows through the full application stack. A test logs into the application as a real user would, navigates to a data entry form, submits information, and verifies the resulting state. The test exercises frontend rendering, API communication, database persistence, and background processing.

test('case worker creates and assigns case', async ({ page }) => {
await page.goto('/login');
await page.fill('[name="email"]', 'caseworker@example.org');
await page.fill('[name="password"]', 'test-password');
await page.click('button[type="submit"]');
await page.waitForURL('/dashboard');
await page.click('text=New Case');
await page.fill('[name="beneficiary_name"]', 'Test Beneficiary');
await page.selectOption('[name="case_type"]', 'protection');
await page.fill('[name="description"]', 'Initial assessment required');
await page.click('button:has-text("Create Case")');
await expect(page.locator('.case-number')).toBeVisible();
const caseNumber = await page.locator('.case-number').textContent();
await page.click('text=Assign');
await page.selectOption('[name="assignee"]', 'protection-officer@example.org');
await page.click('button:has-text("Confirm Assignment")');
await expect(page.locator('.assignee')).toHaveText('Protection Officer');
});

End-to-end tests require the full application running: frontend server, backend API, database, authentication service, and any background workers. Test environments replicate production architecture at smaller scale.

End-to-end tests execute in minutes because they include page rendering, network latency, and human-scale interactions. A test suite of 50 end-to-end tests might require 15-30 minutes. This duration limits end-to-end tests to nightly runs or pre-release verification rather than continuous integration.

End-to-end tests catch defects invisible to other test types: CSS that hides critical buttons, JavaScript errors that break form submission, race conditions between frontend and backend, authentication flows that fail in specific browsers. These tests verify the application as users experience it.

The cost of end-to-end tests extends beyond execution time. These tests are fragile: changes to page layout, element identifiers, or workflow sequences break tests even when functionality remains correct. Maintaining end-to-end tests requires ongoing effort to update selectors, wait conditions, and assertions as the application evolves.

Contract tests

Contract tests verify API compatibility between services without requiring both services to run simultaneously. A consumer service defines its expectations of a provider API: which endpoints it calls, what request formats it sends, what response formats it expects. The provider service verifies that its implementation satisfies these expectations.

# Consumer contract: case-service expects from notification-service
interactions:
- description: Send case assignment notification
request:
method: POST
path: /api/notifications
headers:
Content-Type: application/json
body:
type: case_assignment
recipient_id: "12345"
case_id: "67890"
message: "You have been assigned case #67890"
response:
status: 201
body:
notification_id: "abc-123"
status: queued

Contract tests execute in seconds because they verify format compatibility rather than actual service behaviour. The provider service runs against the contract definitions, confirming that its endpoints accept the expected request formats and produce the expected response formats.

Contract tests catch defects that emerge when services evolve independently. A notification service that changes its API from recipient_id to user_id would break consumers. Contract tests detect this incompatibility before deployment, even when consumer and provider teams work on different release schedules.

Environment architecture

Testing strategy includes the environments where tests execute. Different test types require different environment characteristics: isolation, production similarity, data availability, and execution speed.

+------------------------------------------------------------------+
| ENVIRONMENT TOPOLOGY |
+------------------------------------------------------------------+
| |
| DEVELOPER WORKSTATION CI ENVIRONMENT |
| +------------------------+ +---------------------------+ |
| | Unit Tests | | Unit Tests | |
| | - In-memory only | | - Parallel execution | |
| | - <10 seconds | | - Coverage reporting | |
| | | | | |
| | Integration Tests | | Integration Tests | |
| | - Local containers | | - Ephemeral containers | |
| | - Subset of suite | | - Full suite | |
| +------------------------+ +---------------------------+ |
| | |
| v |
| STAGING ENVIRONMENT PRODUCTION ENVIRONMENT |
| +------------------------+ +---------------------------+ |
| | E2E Tests | | Smoke Tests | |
| | - Full stack | | - Critical paths only | |
| | - Synthetic data | | - Post-deployment | |
| | | | | |
| | Performance Tests | | Synthetic Monitoring | |
| | - Production scale | | - Continuous execution | |
| | - Load simulation | | - Real user simulation | |
| +------------------------+ +---------------------------+ |
| |
+------------------------------------------------------------------+

Figure 3: Test environment topology showing test type placement

Developer workstations run unit tests continuously and integration tests on demand. The local environment provides fast feedback during development. Containers for databases and services start quickly and reset between test runs.

CI environments run the complete test suite on every proposed change. Ephemeral environments spin up for each test run, eliminating interference between concurrent test executions. Parallel execution across multiple workers reduces total test time.

Staging environments match production architecture at reduced scale. End-to-end tests verify complete workflows against realistic infrastructure. Performance tests measure response times and throughput under simulated load.

Production environments run smoke tests after each deployment. These tests verify critical paths function correctly: user authentication, primary workflows, essential integrations. Synthetic monitoring continues testing production systems continuously, detecting failures before users report them.

Test data management

Tests require data: database records, file contents, API responses. Test data management ensures tests have the data they need while preventing data-related failures.

Fixture strategies

Factory fixtures generate data programmatically with sensible defaults. A beneficiary factory creates valid beneficiary records with randomised names and registration dates. Tests override specific attributes relevant to the test scenario while accepting defaults for irrelevant attributes.

class BeneficiaryFactory:
@staticmethod
def create(
name: str = None,
registration_date: date = None,
household_size: int = None,
status: str = "active"
) -> Beneficiary:
return Beneficiary(
id=uuid4(),
name=name or fake.name(),
registration_date=registration_date or date.today(),
household_size=household_size or random.randint(1, 8),
status=status
)
def test_household_benefits_calculation():
# Only household_size matters for this test
beneficiary = BeneficiaryFactory.create(household_size=5)
benefits = calculate_benefits(beneficiary)
assert benefits.food_allocation == 25 # 5 kg per person

Snapshot fixtures capture production-like data sets for testing. A snapshot contains anonymised records exported from production: realistic field values, representative data distributions, actual edge cases. Snapshots provide more realistic test coverage than synthetic data but require maintenance as schemas evolve.

Minimal fixtures contain the smallest data set that exercises test scenarios. A test for pagination logic needs exactly eleven records: ten for the first page, one for the second page. Minimal fixtures execute faster and fail more clearly than large data sets.

Data isolation

Tests must not interfere with each other through shared data. A test that creates a user named “test@example.org” will conflict with another test expecting to create that same user.

Transaction rollback wraps each test in a database transaction that rolls back after the test completes. Data created during the test exists only for that test’s duration. This approach provides complete isolation with minimal performance cost.

@pytest.fixture
def db_session():
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()

Unique identifiers ensure tests create non-conflicting data. Email addresses include test identifiers: test-{uuid}@example.org. This approach allows tests to run in parallel against shared databases without conflicts.

Database cleaning resets database state between tests. Truncation removes all data from tables. Seeding inserts baseline data after cleaning. This approach is slower than transaction rollback but works for tests that commit transactions.

Coverage philosophy

Test coverage measures what proportion of code executes during test runs. Coverage tools track which lines, branches, or statements tests exercise. A codebase with 80% line coverage has tests that execute 80% of its lines at least once.

Coverage indicates what code tests touch, not what behaviour tests verify. A test that calls a function and ignores its return value achieves coverage without verification. A test that verifies return values for normal inputs achieves coverage without verifying error handling.

Coverage targets

Coverage targets vary by code criticality. Security-sensitive code warrants higher coverage than utility functions. User-facing features warrant higher coverage than administrative tools.

Code categoryCoverage targetRationale
Authentication and authorisation95%+Security failures have severe consequences
Financial calculations95%+Errors directly impact beneficiaries and compliance
Data validation90%+Invalid data propagates through entire system
Business logic85%+Core functionality must work correctly
API endpoints80%+External contracts must be reliable
UI components70%+Visual regressions are less severe than data errors
Internal utilities60%+Lower risk, lower investment

Table 1: Coverage targets by code category

Coverage targets function as minimums, not goals. Achieving 85% coverage does not indicate adequate testing if the untested 15% contains critical error handling. Coverage reports identify untested code for review; humans determine whether that code requires tests.

Coverage gaps

Coverage reports highlight gaps: code that no test executes. Some gaps warrant new tests; others reflect appropriate test strategy.

Gaps that warrant tests include error handling paths, boundary conditions, and security controls. If no test verifies that authentication fails for invalid credentials, that gap represents risk.

Gaps that may not warrant tests include logging statements, debug code, and infrastructure configuration. Testing that a function logs an informational message provides minimal value relative to test maintenance cost.

Coverage analysis identifies the gap; engineering judgment determines the response. A gap in payment processing code demands immediate attention. A gap in administrative dashboard rendering might acceptably remain untested.

Risk-based prioritisation

Testing everything equally distributes effort without regard to consequence. Risk-based testing concentrates effort on code where failures cause greatest harm.

Risk factors

User impact measures how failures affect users. A failure in beneficiary registration prevents people from receiving services. A failure in report formatting produces inconvenient but workable output.

Failure frequency measures how often code fails. Complex calculations fail more often than simple lookups. Code handling external inputs fails more often than code with controlled inputs.

Failure detectability measures how quickly failures surface. A calculation error might propagate undetected for weeks before someone notices incorrect totals. A crash on page load surfaces immediately.

Risk combines these factors: high-impact failures that occur frequently and remain undetected demand intensive testing. Low-impact failures that occur rarely and surface immediately tolerate lighter testing.

Prioritisation matrix

IMPACT
Low High
+----------+----------+
High | Medium | High |
F | Testing | Testing |
R | | |
E | Unit + | Unit + |
Q Low | some | Int + |
U | Int | E2E |
E +----------+----------+
N | Low | Medium |
C Low | Testing | Testing |
Y | | |
| Unit | Unit + |
| only | Int |
+----------+----------+

Figure 4: Risk-based test prioritisation matrix

High-impact, high-frequency code receives comprehensive testing across all levels. A case assignment function that staff use hundreds of times daily warrants unit tests for all logic branches, integration tests for database interactions, and end-to-end tests for the complete workflow.

High-impact, low-frequency code receives thorough unit and integration testing with selective end-to-end coverage. A grant closeout function that runs quarterly warrants comprehensive unit and integration tests but might not justify dedicated end-to-end tests.

Low-impact, high-frequency code receives unit and integration testing. A dashboard widget that displays incorrectly causes inconvenience, not harm. Unit tests verify calculations; integration tests verify data retrieval.

Low-impact, low-frequency code receives minimal testing. An administrative utility that reformats export files for a legacy system warrants basic unit tests to catch obvious errors.

Third-party integration testing

Applications depend on external services: payment processors, mapping providers, notification gateways. Testing these integrations requires strategies that balance thoroughness against external service costs and availability.

Test doubles

Mock services simulate external APIs without network calls. Mocks return predetermined responses for expected requests. A mock payment processor returns success for valid card numbers and specific error codes for test card numbers.

@pytest.fixture
def mock_payment_gateway(requests_mock):
requests_mock.post(
"https://api.paymentprovider.com/charges",
json={"id": "ch_123", "status": "succeeded"}
)
return requests_mock
def test_process_payment_success(mock_payment_gateway):
result = payment_service.process(amount=5000, card_token="tok_valid")
assert result.success is True
assert result.charge_id == "ch_123"

Mocks execute quickly because they involve no network I/O. A test suite with 200 payment tests completes in seconds rather than the minutes required for actual API calls.

The limitation of mocks is fidelity. Mocks return what developers expect external services to return, not what external services actually return. An API that changes its response format breaks production while tests continue passing against outdated mocks.

Sandbox environments provided by external services enable integration testing against actual service behaviour. A payment processor sandbox accepts test credentials and simulates real processing without moving money. Tests verify actual API communication, authentication, and error handling.

def test_process_payment_sandbox():
# Uses actual API with sandbox credentials
result = payment_service.process(
amount=5000,
card_token="tok_sandbox_valid",
api_key=SANDBOX_API_KEY
)
assert result.success is True

Sandbox tests run slower than mocks because they involve network communication. Sandbox availability depends on external service reliability. Rate limits constrain how many sandbox tests can execute.

Contract verification

Contract tests verify that mock behaviour matches actual service behaviour. The test suite maintains contracts defining expected API behaviour. Periodic verification against actual services confirms contracts remain accurate.

+------------------------------------------------------------------+
| CONTRACT VERIFICATION FLOW |
+------------------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Define Contract | | Mock Service | |
| | - Endpoints +------->| - Returns | |
| | - Request format | | contract | |
| | - Response format| | responses | |
| +------------------+ +--------+---------+ |
| | |
| +-----------------------+ |
| | |
| v |
| +------------+-------+ +------------------+ |
| | Unit/Integration | | Contract | |
| | Tests | | Verification | |
| | - Run against mock | | - Run against | |
| | - Fast execution | | actual service | |
| | - CI pipeline | | - Weekly/monthly | |
| +--------------------+ +------------------+ |
| | |
| v |
| +--------+---------+ |
| | Update contracts | |
| | if service | |
| | changed | |
| +------------------+ |
| |
+------------------------------------------------------------------+

Figure 5: Contract verification ensuring mock fidelity

Contract verification runs periodically against actual services: weekly for stable APIs, daily for APIs under active development. Verification failures trigger contract updates and corresponding mock updates.

Regression testing

Regression tests verify that existing functionality continues working after code changes. Every bug fix includes a regression test that fails before the fix and passes after, preventing the bug from recurring.

Regression test selection

Not every test runs on every change. Full test suites grow too large for continuous execution. Test selection strategies run subsets appropriate to each change.

Affected tests execute tests related to modified code. A change to the beneficiary registration module runs beneficiary tests without running unrelated payment tests. Static analysis identifies which tests exercise modified code paths.

Risk-based selection prioritises tests by failure probability and impact. Changes to authentication code run all security tests regardless of direct code relationship. Changes to formatting utilities run only directly affected tests.

Time-based selection runs comprehensive tests periodically regardless of changes. Nightly builds run the full test suite, catching failures that selective runs miss. Weekly builds include performance tests and extended end-to-end scenarios.

Flaky test management

Flaky tests undermine testing confidence. A test that fails intermittently teaches developers to ignore failures and retry until tests pass. Flaky tests demand immediate attention.

Flaky test causes include timing dependencies, shared state, and environmental assumptions. A test that waits one second for an asynchronous operation fails when the operation occasionally takes two seconds. A test that assumes an empty database fails when previous tests leave data. A test that assumes UTC timezone fails when developers run tests in other timezones.

Flaky test responses include quarantine, diagnosis, and repair. Quarantine removes the test from the main suite to prevent blocking unrelated changes. Diagnosis identifies the root cause through logs, timing analysis, and isolation testing. Repair eliminates the flaky condition through explicit waits, unique test data, and environment normalisation.

Implementation considerations

Testing strategy scales to organisational capacity. Resource-constrained teams prioritise high-impact testing with minimal infrastructure. Established teams implement comprehensive automation with specialised environments.

Minimal capacity contexts

Organisations with single-person IT functions or no dedicated development staff focus testing effort on highest-risk code paths. A pragmatic minimum provides meaningful quality assurance within severe constraints.

The essential tests verify that critical paths function correctly. For a case management application, critical paths include: creating cases, assigning cases, recording case notes, and generating reports. Manual testing of these paths before each release catches obvious regressions. A checklist ensures consistent coverage across testers and releases.

Unit tests for complex calculations provide disproportionate value. A function that calculates benefit amounts based on household composition involves logic that manual testing cannot thoroughly verify. Unit tests execute in seconds and require no infrastructure.

Integration tests for database operations prevent data corruption. A repository function that saves case records should be verified against an actual database, even if only locally. Database constraints catch errors that application code misses.

End-to-end tests become manual acceptance testing. Before each release, a staff member follows documented scenarios through the application, verifying expected behaviour. This approach is slow but sustainable without test automation investment.

Established team contexts

Organisations with dedicated development teams implement comprehensive automated testing with continuous integration. The test pyramid or trophy model guides proportions based on application architecture.

Automated test suites run on every proposed change, blocking merges that break existing functionality. Coverage reporting identifies gaps for review. Flaky test detection quarantines unreliable tests before they erode confidence.

Multiple test environments support different testing needs. Developer workstations run unit tests continuously. CI environments run full suites on every change. Staging environments run end-to-end tests before release.

Test data management includes factories for synthetic data, snapshots of anonymised production data, and automated cleanup between test runs. Data isolation prevents tests from interfering with each other.

Field deployment considerations

Applications deployed in field contexts face connectivity constraints that affect testing strategy. Offline-capable applications require testing without network access. Low-bandwidth deployments require testing under throttled conditions.

Offline functionality tests verify behaviour when network requests fail. A mobile data collection application must save records locally when offline and synchronise when connectivity returns. Tests simulate network failure at various points in workflows.

def test_case_submission_offline(network_simulator):
network_simulator.disconnect()
# Submit case while offline
case = create_case(beneficiary_id="123", notes="Assessment complete")
result = case_service.submit(case)
assert result.status == "queued_offline"
assert local_storage.contains(case.id)
# Reconnect and verify sync
network_simulator.connect()
sync_service.sync_pending()
assert remote_api.case_exists(case.id)
assert local_storage.case_synced(case.id)

Bandwidth tests verify acceptable performance under constrained conditions. Response times that feel instant on office broadband become frustrating on 2G mobile connections. Tests measure actual response times under simulated throttling.

Sync conflict tests verify correct handling when offline changes conflict with server changes. Two users editing the same record offline create conflicts on sync. Tests verify that conflict resolution preserves data integrity and notifies users appropriately.

Quality metrics

Testing effectiveness requires measurement beyond pass/fail status. Metrics indicate whether the testing strategy catches defects before users encounter them.

MetricCalculationTargetConcern threshold
Defect escape rateProduction defects / total defects found< 10%> 25%
Test execution timeTotal time for full suite< 15 min> 30 min
Flaky test rateTests with inconsistent results / total tests< 2%> 5%
Coverage deltaCoverage change per releaseStable or increasingDecreasing
Test maintenance ratioTest changes / code changes< 0.5> 1.0
Mean time to feedbackCommit to test result< 10 min> 20 min

Table 2: Testing quality metrics with targets

Defect escape rate measures testing effectiveness. Defects found by users represent testing failures. A high escape rate indicates gaps in test coverage or test types.

Test execution time constrains testing frequency. Suites that take hours run less frequently than suites that take minutes. Long execution times indicate need for parallelisation, test optimisation, or test selection.

Flaky test rate indicates test reliability. Flaky tests erode confidence and waste developer time. Rising flaky rates demand immediate remediation.

Coverage delta tracks coverage trends. Decreasing coverage indicates new code without corresponding tests. Stable or increasing coverage indicates disciplined test writing.

Test maintenance ratio measures test sustainability. When tests require more changes than the code they test, the test approach may be too brittle or too coupled to implementation details.

Mean time to feedback measures developer experience. Fast feedback enables rapid iteration. Slow feedback delays defect discovery and increases context-switching costs.

See also