“Statistical or scientific de-identification” is an important tool to assist public health in negotiating its dual and sometimes conflicting missions – maintaining the privacy of the information it collects and sharing the information broadly with the community in a legal and privacy protective manner. As opposed to prescriptive methods, which delineate the removal of specific direct and indirect identifiers from the data set, this approach involves removing direct identifiers, like name and Social Security number, and balancing the utility of the inclusion of indirect identifiers, such as dates and geographies, with the risk of re-identification; this approach yields multiple solutions and provides flexibility. Statistical or scientific de-identification allows the expert, in consultation with the data steward, to determine which method(s) to apply to the data set to de-identify the indirect identifiers.
De-identification provides public health with many benefits:
This fact sheet provides an overview of statistical and scientific de-identification methods of structured data, such as lab values and patient demographics, where the data are entered utilizing pre-defined fields from within the record. This fact sheet is not intended for de-identification of unstructured data, such as narrative reports or multimedia. Additionally, detail regarding methods for creation of synthetic data or data enclaves are beyond the scope of this fact sheet.