On this page

Privacy by Design

Privacy by Design is a framework that embeds data protection into the design and architecture of systems, business practices, and physical infrastructure from the earliest stages of development. Rather than treating privacy as a compliance requirement addressed through retrospective controls, the framework positions privacy as a foundational design constraint that shapes how organisations collect, process, store, and dispose of personal data. The approach originated in the 1990s through the work of Ann Cavoukian, then Information and Privacy Commissioner of Ontario, and has since been codified into regulatory requirements including GDPR Article 25, which mandates data protection by design and by default.

Privacy by Design: A framework requiring privacy protections to be embedded into the design of systems and processes from inception, rather than added as an afterthought. Abbreviated as PbD.
Privacy by Default: The principle that systems must automatically apply the highest privacy settings without requiring user intervention. Users must take explicit action to reduce privacy protections, not to enable them.
Data Protection Impact Assessment: A systematic analysis of how a proposed processing operation affects the protection of personal data. Required under GDPR Article 35 for high-risk processing. Abbreviated as DPIA.
Privacy-Enhancing Technology: Technical measures that protect personal data while enabling its beneficial use. Includes anonymisation, pseudonymisation, differential privacy, and secure computation methods. Abbreviated as PET.
Data Minimisation: The principle that personal data collected must be adequate, relevant, and limited to what is necessary for the specified purpose.
Purpose Limitation: The requirement that personal data be collected for specified, explicit, and legitimate purposes and not further processed in ways incompatible with those purposes.

The seven foundational principles

Privacy by Design rests on seven principles that together create a comprehensive approach to embedding privacy into organisational operations. These principles function as design constraints that must be satisfied throughout the lifecycle of any system or process handling personal data.

Proactive not reactive

The first principle requires organisations to anticipate and prevent privacy-invasive events before they occur. This stands in contrast to reactive approaches that address privacy violations after harm has materialised. Proactive privacy protection requires threat modelling during system design, identification of privacy risks before deployment, and implementation of preventive controls rather than detective or corrective ones.

In practice, proactive privacy means conducting privacy risk assessments at the requirements gathering stage, not during testing. When an organisation plans to deploy a new beneficiary registration system, the proactive approach evaluates what personal data the system will collect, whether each data element is necessary, what risks arise from that collection, and what architectural decisions will minimise those risks. These questions are answered before any code is written or vendor is selected.

The proactive principle also requires continuous monitoring of the privacy landscape for emerging risks. New technologies, changes in the threat environment, and evolving regulatory interpretations transform previously acceptable practices into privacy violations. Organisations implementing Privacy by Design establish mechanisms to identify these shifts and adapt their systems accordingly.

Privacy as the default setting

Systems must ship with privacy protections enabled at their maximum level, requiring no action from the individual to protect their personal data. This principle recognises the power imbalance between organisations and individuals: the vast majority of people lack the technical knowledge or time to navigate complex privacy settings, and even those who possess such knowledge cannot reasonably be expected to configure every system they interact with.

Privacy as the default operates across several dimensions. Data collection defaults must request only essential information, with optional fields clearly marked and empty by default. Data sharing defaults must prevent disclosure to third parties unless the individual takes explicit action to permit it. Data retention defaults must apply the shortest defensible retention period, with longer retention requiring justification and approval.

Consider a programme registration form that collects demographic information for monitoring and evaluation purposes. Under privacy by default, the form collects only what is legally required and operationally essential. Optional demographic questions appear on a separate screen with clear indication that the individual can skip them without consequence. Any data sharing with implementing partners requires affirmative consent on a per-partner basis, not a blanket authorisation buried in terms of service.

Privacy embedded into design

Privacy protections must be integral to the system architecture, not bolted on as add-ons or plugins. Embedded privacy becomes part of the core functionality, inseparable from the system’s operation. This integration ensures that privacy protections cannot be easily removed or bypassed, and that they scale with the system rather than becoming bottlenecks.

Embedded privacy manifests in database schema decisions, API designs, user interface flows, and infrastructure architecture. A database designed with embedded privacy stores personal data in encrypted form at rest, implements row-level access controls tied to purpose-specific roles, and maintains audit logs that themselves minimise personal data exposure. An API designed with embedded privacy returns only the data fields necessary for the requesting function, enforces rate limiting to prevent bulk extraction, and requires purpose-specific authentication tokens rather than general access credentials.

The embedded approach contrasts with the perimeter security model, where privacy controls exist only at system boundaries. Perimeter-only privacy fails when boundaries are breached, when internal processes are compromised, or when legitimate internal access is abused. Embedded privacy maintains protection even when outer defences fail.

Full functionality through positive-sum

Privacy by Design rejects the premise that privacy must come at the expense of functionality, security, or business objectives. The framework demands positive-sum solutions that deliver both privacy and full functionality, avoiding false dichotomies that force organisations to choose between protecting personal data and achieving their mission.

This principle requires creative engineering and architectural thinking. When apparent conflicts arise between privacy and functionality, the positive-sum approach seeks alternative designs that satisfy both requirements. If a programme requires household composition data for targeting but privacy concerns counsel against collecting names of household members, a positive-sum solution collects aggregate household characteristics without individual identification, or implements selective disclosure protocols that reveal only the minimum information needed for each decision.

The positive-sum principle also applies to security and privacy interactions. Traditional security approaches that maximise logging and monitoring can conflict with privacy by creating detailed records of individual behaviour. Positive-sum solutions implement security monitoring that detects threats without creating comprehensive personal surveillance, perhaps by analysing patterns rather than individual records, or by implementing privacy-preserving anomaly detection.

End-to-end security through full lifecycle protection

Privacy protections must extend throughout the entire lifecycle of personal data, from initial collection through processing, storage, and eventual disposal. Lifecycle protection ensures that privacy is maintained regardless of which system, process, or organisational unit handles the data at any given moment.

The lifecycle begins before data collection, with decisions about what data to collect and how to collect it securely. Collection mechanisms must prevent interception, ensure authenticity, and create accurate records of consent and purpose. Processing phases must maintain confidentiality while enabling legitimate use, with technical controls preventing unauthorised access or modification. Storage phases must protect data at rest through encryption, access controls, and physical security measures appropriate to the data sensitivity.

+------------------------------------------------------------------+
|                   DATA LIFECYCLE PROTECTION                      |
+------------------------------------------------------------------+
|                                                                  |
|  +------------+     +------------+     +------------+            |
|  |            |     |            |     |            |            |
|  | COLLECTION +---->| PROCESSING +---->|  STORAGE   |            |
|  |            |     |            |     |            |            |
|  +-----+------+     +-----+------+     +-----+------+            |
|        |                  |                  |                   |
|   +----v----+        +----v----+        +----v----+              |
|   |Consent  |        |Purpose  |        |Encrypted|              |
|   |Verified |        |Enforced |        |At Rest  |              |
|   +---------+        +---------+        +---------+              |
|   |Minimised|        |Audit    |        |Access   |              |
|   |Fields   |        |Logged   |        |Controls |              |
|   +---------+        +---------+        +---------+              |
|   |Secure   |        |Retention|        |Integrity|              |
|   |Channel  |        |Applied  |        |Verified |              |
|   +---------+        +---------+        +---------+              |
|                                                                  |
|  +------------+     +------------+     +------------+            |
|  |            |     |            |     |            |            |
|  |  SHARING   +---->| ARCHIVING  +---->|  DISPOSAL  |            |
|  |            |     |            |     |            |            |
|  +-----+------+     +-----+------+     +-----+------+            |
|        |                  |                  |                   |
|   +----v----+        +----v----+        +----v----+              |
|   |Authoris-|        |Reduced  |        |Verified |              |
|   |ation    |        |Access   |        |Deletion |              |
|   +---------+        +---------+        +---------+              |
|   |Purpose  |        |Extended |        |Media    |              |
|   |Bound    |        |Encrypt  |        |Sanitise |              |
|   +---------+        +---------+        +---------+              |
|   |Transfer |        |Legal    |        |Audit    |              |
|   |Secure   |        |Hold     |        |Complete |              |
|   +---------+        +---------+        +---------+              |
|                                                                  |
+------------------------------------------------------------------+

Figure 1: Privacy controls applied at each lifecycle phase

Disposal represents the final and frequently neglected lifecycle phase. Full lifecycle protection requires verified deletion procedures that render personal data unrecoverable, including data in backups, archives, and derived datasets. Disposal must account for data that has been shared with third parties, requiring contractual provisions for deletion notification and verification.

Visibility and transparency

Privacy by Design requires that all stakeholders, particularly the individuals whose data is processed, can verify that the organisation operates according to its stated privacy practices. Transparency operates at multiple levels: policy transparency describes what the organisation does with personal data, process transparency enables verification that the organisation follows its policies, and outcome transparency demonstrates the effects of data processing on individuals.

Transparency requires clear, accessible privacy notices that describe data practices in language understandable to the intended audience. For humanitarian organisations serving diverse populations, this requires privacy notices in multiple languages, at varying literacy levels, and in formats accessible to persons with disabilities. Transparency also requires making audit logs, processing records, and compliance certifications available to appropriate stakeholders.

The visibility principle does not require exposing trade secrets or security-sensitive information. Organisations can maintain confidentiality of security controls while still providing sufficient transparency for individuals to understand how their data is protected. The test is whether a reasonable person could verify that the organisation respects their privacy, not whether every technical detail is public.

Respect for user privacy through user-centricity

The final principle places individual interests at the centre of privacy architecture. User-centric design prioritises the privacy preferences and expectations of individuals over organisational convenience or technical expedience. Systems must offer strong privacy defaults, appropriate notice, and user-friendly options that empower individuals to manage their own data.

User-centricity requires understanding the individuals whose data is processed. In humanitarian contexts, this understanding must account for vulnerable populations, power imbalances, and the potential consequences of data misuse for individuals in precarious situations. A user-centric approach to beneficiary data recognises that privacy preferences differ from those in commercial contexts, that consent models must account for duress and dependency, and that privacy violations can result in physical harm rather than merely commercial inconvenience.

Respect for user privacy also requires meaningful choice. Presenting individuals with take-it-or-leave-it terms does not constitute user-centricity, nor does burying privacy controls in complex settings menus. User-centric design makes privacy choices prominent, comprehensible, and consequential.

Privacy risk assessment methodology

Privacy risk assessment provides the analytical foundation for Privacy by Design implementation. The assessment identifies processing operations that pose privacy risks, evaluates the severity and likelihood of those risks, and determines appropriate mitigation measures. While Data Protection Impact Assessments represent the formal regulatory requirement, privacy risk assessment is a broader discipline that informs design decisions throughout system development.

Risk identification

Privacy risks arise from the interaction between personal data, processing operations, and the context in which processing occurs. Risk identification examines each element to determine potential harms to individuals.

Personal data characteristics that elevate risk include sensitivity (special category data under GDPR, protection data in humanitarian contexts), volume (processing affecting more than 10,000 individuals), identifiability (directly identifying versus pseudonymous), and accuracy requirements (decisions based on potentially incorrect data). Processing operation characteristics that elevate risk include automated decision-making, profiling, systematic monitoring, and novel uses of existing data. Contextual factors that elevate risk include vulnerable data subjects, power imbalances, cross-border transfers, and high-stakes consequences.

+------------------------------------------------------------------+
|                    PRIVACY RISK IDENTIFICATION                    |
+------------------------------------------------------------------+
|                                                                    |
|     DATA CHARACTERISTICS          PROCESSING CHARACTERISTICS       |
|     +------------------+          +------------------+             |
|     | Special category |          | Automated        |             |
|     | data (health,    +---+  +---+ decisions        |             |
|     | biometric, etc.) |   |  |   | affecting        |             |
|     +------------------+   |  |   | individuals      |             |
|                            |  |   +------------------+             |
|     +------------------+   |  |                                    |
|     | Large-scale      |   |  |   +------------------+             |
|     | processing       +---+  +---+ Profiling or     |             |
|     | (>10,000 records)|   |  |   | scoring          |             |
|     +------------------+   |  |   +------------------+             |
|                            |  |                                    |
|     +------------------+   v  v   +------------------+             |
|     | Direct           |  +----+  | Systematic       |             |
|     | identifiers      +->|RISK+<-+ monitoring       |             |
|     | present          |  |EVAL|  | of individuals   |             |
|     +------------------+  +----+  +------------------+             |
|                            ^  ^                                    |
|     +------------------+   |  |   +------------------+             |
|     | Accuracy-        |   |  |   | Novel technology |             |
|     | dependent        +---+  +---+ or methods       |             |
|     | decisions        |   |  |   |                  |             |
|     +------------------+   |  |   +------------------+             |
|                            |  |                                    |
|     CONTEXTUAL FACTORS     |  |   CONSEQUENCE FACTORS              |
|     +------------------+   |  |   +------------------+             |
|     | Vulnerable       +---+  +---+ High-stakes      |             |
|     | data subjects    |          | outcomes         |             |
|     +------------------+          +------------------+             |
|                                                                    |
+------------------------------------------------------------------+

Figure 2: Privacy risk factor categories feeding into risk evaluation

Risk identification produces a catalogue of potential privacy harms. These harms include tangible consequences such as financial loss, discrimination, or physical danger, as well as intangible harms such as loss of autonomy, damaged reputation, or chilling effects on behaviour. The catalogue serves as input to risk evaluation and informs the scope of mitigation measures.

Risk evaluation

Risk evaluation assigns severity and likelihood ratings to identified risks, enabling prioritisation of mitigation efforts. The severity score derives from three factors: magnitude of harm to an individual (scored 1-4), breadth of affected population (scored 1-4), and reversibility of the harm (scored 1-4). These three scores are multiplied to produce a severity rating between 1 and 64, then mapped to four severity levels.

Severity level assignment works as follows. A combined score of 1-8 indicates negligible severity: inconvenience that individuals can overcome with minimal effort, such as receiving unwanted marketing communications. A score of 9-24 indicates limited severity: significant inconvenience requiring effort to overcome, such as correcting inaccurate records across multiple systems. A score of 25-48 indicates significant severity: serious effects that individuals struggle to overcome, such as discrimination in service access or financial harm exceeding one month’s income. A score of 49-64 indicates maximum severity: irreversible or long-lasting effects, such as physical harm, identity theft, or permanent exclusion from services.

Likelihood assessment examines the probability that each risk materialises given existing controls. The assessment considers four factors: value of data to threat actors (high-value data such as financial or health information increases likelihood), strength of access controls (weak controls increase likelihood), exposure surface (internet-facing systems with many users increase likelihood), and historical incident patterns (similar organisations experiencing similar breaches increases likelihood). Each factor scores 1-4, with the average determining likelihood level: 1.0-1.5 is rare, 1.6-2.5 is possible, 2.6-3.5 is likely, and 3.6-4.0 is almost certain.

Severity	Likelihood: Rare	Likelihood: Possible	Likelihood: Likely	Likelihood: Almost Certain
Negligible	Low	Low	Low	Medium
Limited	Low	Medium	Medium	High
Significant	Medium	High	High	Very High
Maximum	High	Very High	Very High	Very High

Table 1: Risk level matrix combining severity and likelihood

Risk acceptance authority

Low and medium risks can be accepted by the project manager. High risks require approval from the Data Protection Officer or equivalent. Very high risks require executive leadership approval and consideration of supervisory authority consultation.

Mitigation selection

Risk mitigation follows a hierarchy that prioritises elimination over control. The first preference is to avoid the risk entirely by not processing the personal data or by using non-personal data alternatives. When avoidance is impossible, the second preference is to reduce risk through technical and organisational measures. When residual risk remains acceptable given the processing benefits, the third preference is to accept the risk with appropriate monitoring. Transfer of privacy risk to individuals through consent is appropriate only when individuals can genuinely understand and control the risk.

Technical mitigations include pseudonymisation (replacing direct identifiers with tokens while maintaining linkability for authorised purposes), anonymisation (irreversibly removing the ability to identify individuals), encryption (protecting confidentiality during storage and transmission), access controls (limiting who can view or process data), and privacy-enhancing technologies (enabling computation on data without exposing it).

Organisational mitigations include purpose limitation (restricting use to specified purposes), data minimisation (collecting only necessary data), retention limitation (disposing of data when no longer needed), staff training (building privacy awareness), and audit controls (verifying compliance with privacy policies).

Data minimisation mechanisms

Data minimisation translates the principle of collecting only necessary data into concrete architectural and process decisions. Minimisation operates at collection, processing, storage, and sharing phases, with distinct mechanisms appropriate to each phase.

Collection minimisation

Collection minimisation examines each data element against the purposes for which it will be processed. For each proposed field, the assessment asks whether processing can achieve its purpose without this data element. If the answer is yes, the element must not be collected. If the answer is no, the assessment continues to whether the element can be collected in less identifying form while still achieving the purpose.

A cash transfer programme registration system illustrates collection minimisation in practice. The system requires sufficient information to identify beneficiaries for payment delivery, verify eligibility, and prevent duplicate registration. The following table shows a worked example comparing maximalist collection against minimised collection for a programme serving 5,000 households.

Data element	Maximalist approach	Minimised approach	Rationale
Full name	First, middle, last names	First name + surname initial	Sufficient for payment verification; reduces re-identification risk
Date of birth	Full date (1985-03-15)	Year of birth only (1985)	Age-based eligibility requires only year; precise date enables re-identification
National ID	Full ID number	Last 4 digits only	Duplicate detection works with partial match against registration database
Address	Full address with GPS	Community name + distribution point	Sufficient for delivery logistics; precise location creates safety risk
Phone numbers	Primary, secondary, emergency	Single contact number	One number sufficient for programme communications
Household members	Names and ages of all members	Count by age bracket	Household size determines transfer amount; individual names unnecessary
Photograph	Facial photograph	None	Alternative verification method (partial ID + community confirmation)
Biometrics	Fingerprints	None	Fraud rate below 2% does not justify biometric collection risks

The minimised approach reduces the data elements from 14 to 7, eliminates biometric collection entirely, and replaces precise identifiers with partial or aggregated alternatives. The fraud prevention capability decreases marginally (estimated 0.3% increase in duplicate registration), but this trade-off is acceptable given the significant reduction in privacy risk for a vulnerable population.

Collection minimisation also applies to metadata. Web forms collect IP addresses and browser information by default; minimised collection discards or aggregates this metadata unless it serves a documented purpose. Mobile applications collect device identifiers and location data unless configured otherwise; minimised collection requests only permissions necessary for core functionality and discards data after immediate use.

Processing minimisation

Processing minimisation limits what operations are performed on personal data. Even when data has been legitimately collected, not every processing operation is justified. Each analytical query, report generation, or data transformation must demonstrate necessity for a specified purpose.

Processing minimisation implements the principle through role-based access that grants different data views to different functions. A caseworker needs access to individual case records to provide services; their access includes full personal details for their assigned caseload only (approximately 50-150 cases). A programme manager needs access to aggregate performance data to oversee operations; their access includes summary statistics (registration counts, payment completion rates, complaint volumes) without individual records. A monitoring and evaluation analyst needs access to outcome data to assess programme effectiveness; their access includes anonymised records with k-anonymity of at least 5 (no combination of quasi-identifiers appears fewer than 5 times in the dataset).

Processing minimisation also governs analytical operations. Machine learning models trained on personal data must demonstrate that the training achieves a specified purpose that outweighs privacy impact. Ad-hoc queries against personal data must be logged with purpose justification. Data mining operations that search for patterns without specific hypotheses require additional scrutiny given their exploratory nature and must be approved by the Data Protection Officer before execution.

Storage minimisation

Storage minimisation reduces the volume of personal data retained and the duration of retention. Volume reduction applies the same necessity test as collection: if a data element is no longer required for processing purposes, it must be deleted or anonymised. Duration reduction implements retention schedules that dispose of data at the earliest defensible point.

Retention periods derive from legal requirements, operational needs, and legitimate interests. The following table shows standard retention periods for common data categories in programme operations.

Data category	Retention period	Basis	Disposal method
Active beneficiary records	Duration of programme + 2 years	Operational need for service continuity	Anonymise after retention period
Closed case records	7 years from closure	Legal requirement (audit, potential litigation)	Secure deletion with certificate
Financial transaction records	7 years from transaction	Tax and audit requirements	Archive then delete
Consent records	Duration of processing + 3 years	Evidence of lawful basis	Secure deletion
Contact preferences	2 years from last contact	Operational need decays	Automatic purge
Complaint records	5 years from resolution	Organisational learning, potential litigation	Anonymise after retention period
Application metadata (logs)	90 days	Security monitoring	Automatic rotation and deletion

+---------------------------------------------------------------------------------+
|                       RETENTION DECISION FRAMEWORK                              |
+---------------------------------------------------------------------------------+
|                                                                                 |
|     +----------------------------------------------------------+                |
|     |  Is there a legal requirement to retain?                 |                |
|     +-----------------------------+----------------------------+                |
|                                   |                                             |
|                  +----------------+----------------+                            |
|                  |                                 |                            |
|                  v YES                             v NO                         |
|         +--------+--------+               +--------+--------+                   |
|         | Retain for      |               | Is there an     |                   |
|         | statutory       |               | operational     |                   |
|         | period          |               | need to retain? |                   |
|         +-----------------+               +--------+--------+                   |
|                                                    |                            |
|                                   +----------------+----------------+           |
|                                   |                                 |           |
|                                   v YES                             v NO        |
|                          +--------+--------+               +--------+--------+  |
|                          | Retain for      |               | Delete or       |  |
|                          | minimum period  |               | anonymise       |  |
|                          | necessary       |               | within 30 days  |  |
|                          +--------+--------+               +-----------------+  |
|                                   |                                             |
|                                   v                                             |
|                          +--------+--------+                                    |
|                          | Review at end   |                                    |
|                          | of retention    |                                    |
|                          | period          |                                    |
|                          +--------+--------+                                    |
|                                   |                                             |
|              +--------------------+--------------------+                        |
|              |                                         |                        |
|              v STILL NEEDED                            v NO                     |
|     +--------+--------+                       +--------+--------+               |
|     | Document        |                       | Delete or       |               |
|     | justification   |                       | anonymise       |               |
|     | and extend      |                       | within 30 days  |               |
|     +-----------------+                       +-----------------+               |
|                                                                                 |
+---------------------------------------------------------------------------------+

Figure 3: Retention decision tree for personal data

Storage minimisation extends to backup and disaster recovery systems. Backup copies of personal data are subject to the same retention requirements as primary copies. When a record is deleted from production systems, corresponding records must be removed from backups at the next backup cycle or marked for exclusion from restoration. For organisations using 30-day backup rotation, this means personal data persists in backups for up to 30 days after production deletion.

Sharing minimisation limits the circumstances under which personal data is disclosed to third parties and reduces the data elements included in any disclosure. Each sharing arrangement requires a lawful basis, documented purpose, and assessment of recipient safeguards.

Sharing minimisation implements data sharing agreements that specify the exact data elements to be shared, the purposes for which the recipient uses the data, the retention period at the recipient, and the deletion or return requirements when the purpose is achieved. Agreements prohibit onward sharing without explicit authorisation and require the recipient to implement security measures at least equivalent to those of the disclosing organisation.

Technical mechanisms support sharing minimisation. API designs return only requested fields rather than complete records. Bulk exports require approval workflows with purpose justification. Automated data feeds implement field-level filtering to exclude sensitive elements not required by the recipient. For a referral to a health partner, the minimised data set includes beneficiary identifier, contact information, and referral reason, but excludes household composition, financial data, and protection concerns unrelated to the health service.

Purpose limitation implementation

Purpose limitation ensures that personal data collected for specified purposes is not repurposed without appropriate justification. Implementation requires documenting purposes at collection, enforcing purpose boundaries during processing, and assessing compatibility when new uses are proposed.

Purpose documentation

Each collection of personal data must be accompanied by documentation of the purposes for which it will be processed. Purpose documentation is sufficiently specific that individuals can understand what will happen to their data and organisations can assess whether proposed processing falls within scope.

Vague purposes fail the specificity requirement. “Providing services” does not constitute adequate purpose documentation; “delivering cash assistance to eligible households based on vulnerability assessment” does. “Research purposes” does not suffice; “evaluating programme effectiveness by analysing anonymised outcome data to inform future programme design” does. “Improving our operations” provides no meaningful limitation; “optimising delivery routes to reduce response time to under 48 hours” provides actionable boundaries.

Purpose documentation must exist before collection begins. Organisations cannot collect data speculatively and determine purposes later. When new programmes or systems are designed, purpose documentation forms part of the requirements specification and informs data minimisation decisions.

Purpose enforcement

Technical and organisational controls enforce purpose boundaries during processing. Role-based access control assigns permissions aligned with purpose-specific job functions. A caseworker processing protection data for case management receives access to case records but not to aggregate research datasets. A researcher analysing anonymised programme data receives access to the research database but not to individual case records.

Database architecture embeds purpose limitation through logical separation of data by purpose. A constituent relationship management system maintains separate data stores for fundraising communications, advocacy engagement, and service delivery, with controlled interfaces that prevent data flow between purposes without explicit authorisation. The technical implementation uses separate database schemas with cross-schema queries requiring elevated permissions and audit logging.

Audit logging supports purpose enforcement by creating records of data access that can be reviewed for compliance. Logs capture who accessed what data, when, and through which system function. Anomaly detection identifies access patterns inconsistent with role-based purpose assignments, such as a caseworker querying records outside their assigned caseload or a finance officer accessing case notes.

Compatibility assessment

When organisations wish to use existing data for new purposes, compatibility assessment determines whether the new use is consistent with the original collection purpose. GDPR Article 6(4) provides factors for compatibility assessment: the relationship between original and proposed purposes, the context of collection and the relationship with data subjects, the nature of the data, the consequences of further processing for data subjects, and the existence of appropriate safeguards.

Compatible further processing does not require new consent if the new purpose falls within reasonable expectations given the original context. A programme that collected contact information for service delivery can use that information to notify beneficiaries of additional relevant services, as this falls within the service relationship context. The same programme cannot use the contact information for fundraising appeals without separate consent, as marketing represents a materially different purpose.

Incompatible further processing requires a new lawful basis, being consent or legitimate interest with balancing test. If neither is available, the processing cannot proceed regardless of organisational benefit.

Privacy-enhancing technologies

Privacy-enhancing technologies provide technical mechanisms that protect personal data while enabling beneficial processing. These technologies range from established techniques like encryption and pseudonymisation to emerging methods like differential privacy and secure multi-party computation.

Anonymisation and pseudonymisation

Anonymisation removes the ability to identify individuals from a dataset, rendering data protection law inapplicable to the resulting data. True anonymisation is irreversible: no combination of available information can re-identify individuals in the dataset. Anonymised data can be used without restriction for research, publication, and sharing.

Achieving true anonymisation is difficult. Simple removal of direct identifiers (names, ID numbers) leaves quasi-identifiers that enable re-identification through linkage with external datasets. Research demonstrates that birth date, gender, and five-digit postal code can uniquely identify 87% of the US population when combined. For a dataset of 10,000 programme beneficiaries, removing names but retaining birth date, gender, and village creates high re-identification risk: in a village of 500 people, the combination of birth year and gender identifies most individuals uniquely.

Effective anonymisation requires multiple techniques applied together. Generalisation replaces specific values with ranges: exact age becomes age bracket (25-34), specific location becomes region. Suppression removes records that remain identifiable after generalisation: if only one person aged 80+ lives in a given region, that record must be suppressed. K-anonymity ensures that every combination of quasi-identifiers appears at least k times in the dataset; k=5 is the minimum standard, with k=10 or higher for sensitive data.

+------------------------------------------------------------------+
|              ANONYMISATION VS PSEUDONYMISATION                   |
+------------------------------------------------------------------+
|                                                                  |
|  ORIGINAL DATA                                                   |
|  +--------------------+                                          |
|  | Name: John Smith   |                                          |
|  | DOB: 1985-03-15    |                                          |
|  | Location: Bristol  |                                          |
|  | Programme: Cash    |                                          |
|  | Amount: 500 GBP    |                                          |
|  +--------------------+                                          |
|           |                                                      |
|           +------------------+------------------+                |
|           |                                     |                |
|           v                                     v                |
|     PSEUDONYMISED                           ANONYMISED           |
|  +--------------------+                +--------------------+    |
|  | ID: A7B3C9         |                | Age range: 35-44   |    |
|  | DOB: 1985-03-15    |                | Region: South West |    |
|  | Location: Bristol  |                | Programme: Cash    |    |
|  | Programme: Cash    |                | Amount range:      |    |
|  | Amount: 500 GBP    |                |   400-600 GBP      |    |
|  +--------------------+                +--------------------+    |
|            |                                     |               |
|  +---------v---------+                 +---------v---------+     |
|  | Key held          |                 | No re-            |     |
|  | separately        |                 | identification    |     |
|  | Re-identification |                 | possible          |     |
|  | possible          |                 |                   |     |
|  +-------------------+                 +-------------------+     |
|            |                                     |               |
|            v                                     v               |
|  Still personal data                   Not personal data         |
|  GDPR applies                          GDPR does not apply       |
|                                                                  |
+------------------------------------------------------------------+

Figure 4: Comparison of pseudonymisation and anonymisation outcomes

Pseudonymisation replaces direct identifiers with tokens while maintaining the ability to re-identify individuals through a separately-held key. Pseudonymised data remains personal data under GDPR, but pseudonymisation reduces risk and can satisfy requirements for certain processing operations. Pseudonymisation protects against casual observation while permitting authorised re-identification for legitimate purposes.

Differential privacy

Differential privacy provides mathematical guarantees about the privacy of individuals in aggregate statistics. A differentially private algorithm produces outputs that are statistically indistinguishable whether or not any individual’s data is included in the input. This guarantee protects against inference attacks that attempt to determine individual characteristics from aggregate results.

Differential privacy operates by adding calibrated noise to query results. The noise magnitude depends on the privacy budget (epsilon, ε) and the sensitivity of the query. Queries that could vary significantly based on a single individual’s data require more noise than queries robust to individual variation.

A worked example illustrates the mechanism. An organisation wants to publish the average age of programme beneficiaries. The true average is 34.7 years. With ε=1.0 (moderate privacy) and sensitivity of 1 (maximum impact of one person on the average in a dataset of 1,000), the algorithm adds Laplacian noise with scale 1/ε = 1. The published result is 34.7 ± random noise, producing outputs like 34.2, 35.1, or 33.9 across different queries. An attacker cannot determine whether any specific individual is in the dataset by observing these outputs.

Over multiple queries, the privacy budget depletes; organisations must manage this budget to maintain protection. A total budget of ε=10 permits 10 queries at ε=1 each, or 100 queries at ε=0.1 each. Once the budget is exhausted, no further queries can be answered without compromising the privacy guarantee.

Differential privacy implementation

Commercial implementations in Google’s BigQuery, Apple’s analytics, and Microsoft’s Azure embed differential privacy in analytics platforms. These reduce the expertise required for basic use cases but require understanding of budget management for production deployment.

Secure computation

Secure computation techniques enable processing of personal data without exposing the underlying data to the processor. Homomorphic encryption permits computation on encrypted data, producing encrypted results that the data owner can decrypt. Secure multi-party computation allows multiple parties to jointly compute a function over their inputs while keeping those inputs private from each other.

These techniques remain computationally expensive for complex operations but have reached practical feasibility for specific use cases. Private set intersection allows two parties to determine which records they have in common without revealing non-matching records. For two organisations with beneficiary databases of 50,000 records each, private set intersection can identify the approximately 2,000 individuals registered with both in under 60 seconds without either party revealing their complete database.

For mission-driven organisations, secure computation enables collaboration while protecting sensitive data. Partner organisations can jointly analyse their beneficiary databases to identify service gaps or duplications without either party revealing their complete database to the other. This supports coordination while respecting data protection obligations to beneficiaries.

Data Protection Impact Assessment

The Data Protection Impact Assessment is a structured process for identifying and mitigating privacy risks in proposed processing operations. GDPR Article 35 mandates DPIAs for processing resulting in high risk to individuals’ rights and freedoms. Even when not legally required, DPIAs provide a systematic approach to Privacy by Design implementation.

DPIA timing requirement

The assessment must occur before processing begins, at a stage when design changes remain feasible. A DPIA conducted after system deployment serves only documentary purposes; it cannot fulfil its risk-mitigation function if the system architecture is fixed.

Triggering criteria

DPIA is required when processing involves systematic and extensive profiling with significant effects, large-scale processing of special category data, or systematic monitoring of publicly accessible areas. National supervisory authorities publish lists of processing operations requiring DPIA; organisations must consult these lists for jurisdiction-specific requirements.

Beyond mandatory triggers, organisations benefit from conducting DPIAs whenever processing involves new technologies, vulnerable populations, preventing individuals from exercising their rights, or creating risks that existing controls do not adequately address. The cost of conducting a DPIA (typically 20-40 hours for a moderately complex system) is modest compared to the cost of deploying systems that violate privacy requirements.

Assessment structure

The DPIA comprises four analytical phases, each producing specific outputs that feed into subsequent phases.

The description phase documents the proposed processing in sufficient detail to support risk analysis. The description covers the nature of processing (what operations are performed), the scope (what data, how many individuals, what geographic coverage, what time period), the context (the relationship between the organisation and data subjects, any vulnerabilities or power imbalances), and the purposes (the objectives that justify processing). A complete description enables reviewers unfamiliar with the project to understand what will happen to personal data without consulting external sources.

The necessity assessment phase applies data minimisation and purpose limitation principles to the proposed processing. For each data element and processing operation, the assessment asks whether it is genuinely necessary for the stated purpose and whether less privacy-invasive alternatives could achieve the same outcome. The assessment produces a documented justification for each data element collected and each processing operation performed.

The risk analysis phase identifies risks to individuals and evaluates their severity and likelihood using the methodology described earlier in this page. The analysis produces a risk register documenting each identified risk, its severity and likelihood ratings, the resulting risk level, and the rights potentially affected (access, rectification, erasure, restriction, portability, objection).

The mitigation phase identifies controls to address each risk rated medium or higher. For each risk, the phase documents the selected mitigation measure, the implementation owner, the implementation timeline, and the residual risk level after mitigation. Residual risks rated high or very high require additional justification for proceeding and consideration of supervisory authority consultation.

          DPIA PROCESS FLOW

         +----------------+
         | 1. TRIGGER     |  New processing operation proposed
         |    IDENTIFIED  |  or existing processing changed
         +-------+--------+
                 |
                 v
         +-------+--------+
         | 2. SCREENING   |  Does processing require DPIA?
         |                |  Apply Art. 35 criteria and local guidance
         +-------+--------+
                 |
         +-------+-------+
         |               |
         v NO            v YES
+--------+------+  +-----+----------+
| Record why    |  | 3. DESCRIPTION |  Document processing details
| DPIA not      |  |                |  Nature, scope, context, purpose
| needed        |  +-------+--------+
+---------------+          |
                           v
                   +-------+--------+
                   | 4. NECESSITY   |  Is all data/processing necessary?
                   |    ASSESSMENT  |  Are there less invasive options?
                   +-------+--------+
                           |
                           v
                   +-------+--------+
                   | 5. RISK        |  Identify risks to individuals
                   |    ANALYSIS    |  Apply severity/likelihood method
                   +-------+--------+
                           |
                           v
                   +-------+--------+
                   | 6. MITIGATION  |  Select controls for each risk
                   |    PLANNING    |  Document residual risk
                   +-------+--------+
                           |
                           v
                   +-------+--------+
                   | 7. CONSULTATION|  DPO review required
                   |                |  Authority consultation if high
                   |                |  residual risk (8-week period)
                   +-------+--------+
                           |
                           v
                   +-------+--------+
                   | 8. APPROVAL    |  Decision to proceed, modify,
                   |    + REVIEW    |  or reject; set review date
                   +----------------+  (max 3 years from approval)

Figure 5: Complete DPIA process from trigger identification to approval

Consultation requirements

Internal consultation with the Data Protection Officer (or equivalent privacy function) is mandatory for all DPIAs. The DPO provides independent assessment of the DPIA’s thoroughness and the adequacy of proposed mitigations. DPO sign-off does not transfer responsibility for processing; accountability remains with the data controller.

External consultation with supervisory authorities is required under GDPR Article 36 when residual risk remains high despite mitigation measures. Prior consultation gives the authority opportunity to advise on additional safeguards or to prohibit the processing. The eight-week consultation period (extendable by six weeks for complex cases) must be planned into project timelines when high residual risk is anticipated. Failure to consult when required constitutes a compliance violation independent of any harm arising from the processing.

Worked example: DPIA for mobile data collection

This example demonstrates DPIA application to a common scenario: deploying a mobile data collection application for household vulnerability assessment.

Scenario description

An organisation plans to deploy a mobile data collection application (KoboToolbox) to conduct vulnerability assessments for 15,000 households across three districts. Enumerators will visit households, explain the programme, obtain consent, and collect data on household composition, income sources, food security, health status, and housing conditions. Data will be transmitted to a central server for analysis and used to determine eligibility for cash assistance.

Necessity assessment

The assessment examines each proposed data element against the purpose of determining eligibility for cash assistance.

Data element	Proposed collection	Necessity assessment	Decision
Household head name	Full name	Required for payment delivery	Collect first name + surname initial only
National ID	Full number	Required for duplicate prevention	Collect last 4 digits only
GPS coordinates	Precise location	Proposed for logistics	Reject: community-level location sufficient; precise GPS creates safety risk
Household members	Names, ages, relationships	Proposed for household size verification	Collect count by age bracket only; names unnecessary
Income sources	Types and amounts	Required for vulnerability scoring	Collect income ranges, not precise amounts
Health conditions	Specific conditions	Proposed for vulnerability scoring	Collect functional limitation categories; specific diagnoses unnecessary
Food consumption score	Standard indicator	Required for eligibility determination	Collect as proposed
Housing materials	Roof, walls, floor	Required for vulnerability scoring	Collect as proposed
Photographs	Household, dwelling	Proposed for verification	Reject: alternative verification methods available

The assessment reduces data collection from 47 fields to 28 fields and eliminates GPS and photographic data entirely.

Risk analysis

Risk ID	Risk description	Severity	Likelihood	Level
R1	Device theft exposes beneficiary data	Significant (32)	Possible	High
R2	Data interception during transmission	Limited (18)	Rare	Low
R3	Unauthorised access by staff	Limited (16)	Possible	Medium
R4	Data breach affecting all beneficiaries	Maximum (56)	Rare	High
R5	Re-identification through quasi-identifiers	Significant (28)	Possible	High
R6	Use of data beyond stated purpose	Limited (14)	Possible	Medium

Mitigation measures

Risk	Mitigation	Owner	Timeline	Residual
R1	Full-disk encryption on devices; remote wipe capability; 4-hour transmission requirement	IT Manager	Before deployment	Medium
R3	Role-based access; quarterly access reviews; audit logging	Data Officer	Before deployment	Low
R4	Encryption at rest (AES-256); network segmentation; backup to separate location	IT Manager	Before deployment	Medium
R5	K-anonymity of 5 for any published analysis; no individual-level data sharing	M&E Lead	Ongoing	Medium
R6	Purpose documented in consent; technical controls on data export; annual purpose review	Data Officer	Before deployment	Low

Outcome

Residual risk levels are medium or below for all identified risks. The DPIA is approved with conditions: quarterly review of access logs, annual review of purpose compliance, and mandatory review if collection expands beyond three districts or 20,000 households.

Implementation considerations

For organisations with limited IT capacity

Privacy by Design implementation scales to organisational capacity. Single-person IT departments or organisations without dedicated IT staff can achieve meaningful privacy protection through prioritised, incremental measures.

The initial priority is establishing basic data inventory: understanding what personal data exists, where it resides, and who has access. This inventory need not be comprehensive or perfectly maintained; even a rough understanding of major data holdings enables targeted protection efforts. A spreadsheet listing systems, data categories, and approximate volumes provides sufficient foundation for prioritisation.

The second priority is implementing privacy by default in new deployments. When selecting or configuring new systems, apply minimisation principles to the initial setup. Request only necessary data fields in forms. Disable optional tracking features. Configure shortest practical retention periods. These choices cost no more than privacy-invasive alternatives and create less technical debt.

The third priority is conducting lightweight privacy assessments for significant processing. A full DPIA process exceeds available capacity in many organisations, but any new programme or system should prompt basic questions: what personal data will this involve, is all of it necessary, who will access it, how long will it be kept, and what could go wrong. Document answers briefly and revisit if concerns arise.

Technology choices should favour platforms with built-in privacy features. Cloud productivity suites offer data loss prevention, retention policies, and access controls that would require significant effort to implement independently. The configuration overhead is modest compared to building equivalent controls from scratch.

For organisations with established IT functions

Organisations with dedicated IT staff can implement more comprehensive Privacy by Design practices. The foundation is a formal DPIA process integrated into project governance, ensuring that privacy assessment occurs before procurement, development, or deployment decisions are finalised.

Privacy engineering should be embedded in development practices. Security and privacy requirements appear in user stories and acceptance criteria. Code review includes privacy considerations. Testing validates that minimisation, access control, and retention implementations function correctly.

Privacy tooling extends organisational capability. Data discovery tools identify personal data across the estate. Data loss prevention systems enforce handling policies. Privacy information management systems coordinate DPIAs, consent records, and data subject requests. These tools require investment but reduce manual effort for privacy operations.

Architecture standards should encode Privacy by Design principles. Reference architectures for common patterns (web applications, mobile data collection, analytics platforms) specify privacy controls, data flows, and integration points with privacy tooling. New projects implement these standards rather than designing privacy controls from scratch.

For federated organisations

Federated organisations face additional complexity: multiple legal entities, varying local regulations, different systems and practices across units, and coordination challenges for cross-unit data flows.

Privacy by Design in federated contexts requires both common standards and local flexibility. Core principles and minimum requirements must be consistent across the federation: all units conduct DPIAs for high-risk processing, all units implement access controls aligned with purpose limitation, all units maintain data inventories. Implementation specifics vary to accommodate local systems, regulations, and capacity.

Cross-unit data flows require particular attention. Data shared from one unit to another carries the privacy obligations attached at collection. Receiving units must implement controls adequate to those obligations regardless of their local requirements. Data sharing agreements between units document these requirements and establish accountability.

Central privacy functions support federated implementation by providing tools, templates, and expertise that units adapt to local context. A centralised DPIA template reduces local effort while ensuring consistent assessment quality. Shared training resources build privacy capability across units. Central review of high-risk processing provides independent assurance without requiring local DPO expertise.

Appendix: Data Protection Impact Assessment template

This template provides a structure for conducting DPIAs. Adapt the content and level of detail to the processing operation’s complexity and risk level.

Section 1: Processing description

1.1 Processing name and reference

[Name of system, project, or processing operation]

[Internal reference number if applicable]

1.2 Data controller

[Legal entity responsible for the processing]

[Contact details for data protection queries]

1.3 Purpose of processing

[Detailed description of why this processing is being undertaken. Specify the objectives that the processing is intended to achieve. Be specific: “delivering cash assistance to eligible households” rather than “providing services.”]

1.4 Nature of processing

[What operations will be performed on personal data? Include collection, storage, use, sharing, retention, and deletion. Describe any automated decision-making or profiling.]

1.5 Scope of processing

[What categories of personal data will be processed? How many individuals will be affected? What geographic area? What time period?]

1.6 Context of processing

[Describe the relationship between the organisation and data subjects. Note any vulnerabilities, power imbalances, or other contextual factors affecting privacy expectations.]

Section 2: Necessity and proportionality

2.1 Lawful basis

[Identify the lawful basis for processing under applicable data protection law. For GDPR, specify which Article 6 condition applies. For special category data, specify the Article 9 condition.]

2.2 Data minimisation assessment

For each category of personal data, explain why it is necessary for the stated purpose. If less privacy-invasive alternatives could achieve the purpose, explain why they were not selected.

Data category	Necessity justification	Alternatives considered
[Category 1]	[Why necessary]	[Alternatives and why rejected]
[Category 2]	[Why necessary]	[Alternatives and why rejected]

2.3 Purpose limitation

[How will processing be restricted to the stated purposes? What controls prevent use for incompatible purposes?]

2.4 Retention assessment

[What retention period applies? What is the basis for this period? How will data be disposed of at the end of retention?]

Section 3: Risk identification and evaluation

3.1 Identified risks

List each risk to individuals’ rights and freedoms arising from this processing.

Risk ID	Risk description	Affected rights	Severity	Likelihood	Risk level
R1	[Description]	[Which rights]	[Rating]	[Rating]	[Level]
R2	[Description]	[Which rights]	[Rating]	[Rating]	[Level]

3.2 Risk evaluation methodology

[Describe the methodology used to assess severity and likelihood. Reference organisational risk assessment standards if applicable.]

Section 4: Mitigation measures

4.1 Planned mitigations

For each identified risk, describe the controls that will reduce severity or likelihood.

Risk ID	Mitigation measure	Implementation owner	Timeline	Residual risk
R1	[Measure description]	[Responsible party]	[Date]	[Level after mitigation]
R2	[Measure description]	[Responsible party]	[Date]	[Level after mitigation]

4.2 Residual risk acceptance

[For any remaining high or very high residual risks, document the justification for proceeding and any additional monitoring or review requirements.]

Section 5: Consultation

5.1 Internal consultation

Record of consultation with Data Protection Officer or equivalent function.

Date	Consulted party	Outcome
[Date]	[Name/role]	[Approved/concerns raised/modifications required]

5.2 External consultation

[If prior consultation with supervisory authority is required, document the consultation process and outcome.]

5.3 Data subject views

[Where appropriate, document consultation with data subjects or their representatives about the proposed processing.]

Section 6: Approval and review

6.1 DPIA approval

Sign-off from appropriate authority to proceed with processing.

Approver	Role	Date	Decision
[Name]	[Title]	[Date]	[Approved/Approved with conditions/Rejected]

6.2 Review triggers

Specify circumstances that will trigger DPIA review: material changes to processing, new risks identified, regulatory changes, or periodic review schedule.

6.3 Review schedule

Next scheduled review date: [specify date not more than three years from approval]