Identifying Personally Identifying Information (PII), Quantifying the Risk and Anonymising Clinical Data: The Pharmaceutical Users Software Exchange (PHUSE) Approach

2:25 PM - 3:15 PM


In the era of data transparency and sharing data with researchers, companies have been defining their processes and de-identification guidance to comply with data privacy regulations. In particular, researchers may request access to data from sponsors; both the differences in data models and de-identification techniques may make data reanalysis cumbersome and error prone.

Clinical Data Interchange Standards Consortium (CDISC) data models are well established in the industry. PHUSE launched a dedicated working group in July 2014 to define de-identification standards for CDISC data models starting with the Study Data Tabulation Model (SDTM). Stakeholders from pharmaceutical companies, CROs, software vendors, CDISC specialists, data privacy specialists and academia joined forces to define a set of rules against the SDTM data model to provide the industry with a consistent approach to data de-identification and increase consistency across anonymised datasets.

The domains and variables holding potentially personally identifying information (PII) have been rated in terms of impact on data privacy. Based on that rating the variables are allocated standard rules of de-identification, and the rationale and the impact on data utility are documented.

The working group published the PHUSE de-identification standard in May 2015 and the deliverable was referenced later in 2016 in EMA Policy 0070 External Guidance as a tool to also support anonymisation of clinical documents.

In 2019, the working group developed a method to automate data anonymisation and risk assessment. The deliverable gathers, organises and proposes a process map identifying areas which can be automated. The method describes the steps necessary to transform source data to a fully anonymised and documented package, discussing individual tasks, assumptions and potential pitfalls.

This presentation will elaborate on the working group's main findings and recommended approach when using the PHUSE de-identification standard and will explore opportunities for automation, risk quantification and documentation of anonymised data.