The de-identification of personal data and the myriad of methods and algorithms that can be applied to data has received significant attention in the past year. Identifiability is a spectrum and privacy-enhancing technologies (PETs) offer various statistical techniques to de-identify data that require a balancing between identity protection and utility. On one end of the PET spectrum is identifiable data in cleartext (e.g. names and addresses), and on the other end is data that cannot be linked back to an actual person (e.g. synthetic data). The concept of zero identifiability does not really exist because even with random matching of data to people, there is never a zero possibility of such data matching being successful. However, when the risk is extremely low, we can consider the data to be non-identifiable and therefore not personal information, or ‘anonymized’. With regulations now addressing de-identification and anonymity; and big data/AI resulting in increasing risk, the identifiability threshold has been getting significantly lower, in terms of what is considered ‘identifiable’.
What works well in practice is risk-based anonymization. That is, transforming the data with a technique such as generalization, suppression, the addition of noise, aggregation etc., AND also introducing security controls to manage residual risk. The data still needs to be de-identified with a PET such that the residual risk is small enough, but security controls, such as encryption and strict access controls based on least privilege, can also help.
In December 2023, Quebec’s Draft Anonymization Regulation under its recently amended private sector privacy law, Law 25, was published. Here are five important highlights:
- Organizations must establish the purposes for which the anonymized personal information is intended to be used before beginning the process of anonymization;
- The anonymization process must be carried out under the supervision of someone ‘qualified in the field’;
- It is only once personal identifiers are removed and the re-identification risk has been assessed (based on individualization, correlation and inferences that can be drawn from other available information), that an anonymization technique can be chosen;
- Re-identification risks must be reduced with security measures and the assessment of risk must be kept up-to-date, considering technological advancements; and
- The purpose for which anonymized data will be used; the anonymization techniques being implemented; and the re-identification risk analysis must be documented in a ‘register’.
Meanwhile federally, Bill C-27 containing the text of the proposed Consumer Privacy Protection Act (CPPA) has passed second reading and is receiving a line-by-line review by the Standing Committee on Industry and Technology. In my view the earliest we will see it come into force is the Spring of 2025. The CPPA would leave unamended the definition of personal information in PIPEDA as meaning “information about an identifiable individual”. However, it would add the following new definition of ‘de-identify’: “to modify personal information, or create information from personal information, by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual”. Meanwhile, ‘anonymize’ will mean to “irreversibly and permanently modify personal information, in accordance withgenerally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means”. This introduces a very high bar for data to be carved out of the application of Canada’s private sector privacy law without any further guidance given in the law.
Following Quebec’s lead, I believe the definition of de-identified information should include the risk of re-identification, and clear processes to achieve anonymized information should be set out in regulation, including a requirement for a risk assessment. The risk of re-identification externally could be low (public data release) but the risk of re-identification internally could be high (non-public data release), for example, through a deliberate insider attack. In fact, the Ontario Information and Privacy Commissioner published extremely useful guidelines on assessing re-identification risk in 2016 for government institutions in their De-identification Guidelines for Structured Data that are relevant for all industries. Ten years ago, Canada was at the forefront of privacy topics and its time to once again pave the path.
Consistency and alignment between provincial and national requirements is critical for organizations as they consider the role that data scientists should play in data anonymization and work towards compliance.
I look forward to delving into the topic of de-identification practices and regulatory approaches in PRIVATECH’s upcoming CIPP/C and CIPM training courses being offered in May! CLICK HERE to learn more.