Statistical or Scientific De-Identification Fact Sheet
February 11, 2019
“Statistical or scientific de-identification” is an important tool to assist public health in negotiating its dual and sometimes conflicting missions – maintaining the privacy of the information it collects and sharing the information broadly with the community in a legal and privacy protective manner. This fact sheet provides an overview of statistical and scientific de-identification methods of structured data, such as lab values and patient demographics, where the data are entered utilizing pre-defined fields from within the record.
As opposed to prescriptive methods, which delineate the removal of specific direct and indirect identifiers from the data set, this approach involves removing direct identifiers, like name and Social Security number, and balancing the utility of the inclusion of indirect identifiers, such as dates and geographies, with the risk of re-identification; this approach yields multiple solutions and provides flexibility. Statistical or scientific de-identification allows the expert, in consultation with the data steward, to determine which method(s) to apply to the data set to de-identify the indirect identifiers.
De-identification provides public health with many benefits:
- If data are de-identified at the point of collection, the risk of a privacy breach while data are retained, is significantly decreased.
- When data are de-identified prior to sharing, technical and policy controls may be minimized.
- De-identification affords public health with the ability to share data widely with communities and others.
- This fact sheet is intended to be used by privacy officers, public health practitioners, data managers and their attorneys to provide awareness of these methods. See the Resources document, which is part of this toolkit, for technical resources.