Quasi-identifier
   HOME

TheInfoList



OR:

Quasi-identifiers are pieces of information that are not of themselves
unique identifier A unique identifier (UID) is an identifier that is guaranteed to be unique among all identifiers used for those objects and for a specific purpose. The concept was formalized early in the development of computer science and information systems. ...
s, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier. Quasi-identifiers can thus, when combined, become
personally identifying information Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely accepted in the United States, but the phrase it abbreviates ha ...
. This process is called
re-identification Data re-identification or de-anonymization is the practice of matching anonymous data (also known as de-identified data) with publicly available information, or auxiliary data, in order to discover the individual to which the data belong. This is ...
. As an example,
Latanya Sweeney Latanya Arvette Sweeney is an American computer scientist. She is the Daniel Paul Professor of the Practice of Government and Technology at the Harvard Kennedy School and in the Harvard Faculty of Arts and Sciences at Harvard University. She is t ...
has shown that even though neither
gender Gender is the range of characteristics pertaining to femininity and masculinity and differentiating between them. Depending on the context, this may include sex-based social structures (i.e. gender roles) and gender identity. Most cultures u ...
,
birth date A birthday is the anniversary of the birth of a person, or figuratively of an institution. Birthdays of people are celebrated in numerous cultures, often with birthday gifts, birthday cards, a birthday party, or a rite of passage. Many rel ...
s nor
postal code A postal code (also known locally in various English-speaking countries throughout the world as a postcode, post code, PIN or ZIP Code) is a series of letters or digits or both, sometimes including spaces or punctuation, included in a postal a ...
s uniquely identify an individual, the combination of all three is sufficient to identify 87% of individuals in the United States. The term was introduced by Tore Dalenius in 1986. Since then, quasi-identifiers have been the basis of several attacks on released data. For instance, Sweeney linked health records to publicly available information to locate the then-governor of Massachusetts' hospital records using uniquely identifying quasi-identifiers, and Sweeney, Abu and Winn used public voter records to re-identify participants in the Personal Genome Project. Additionally, Arvind Narayanan and Vitaly Shmatikov discussed on quasi-identifiers to indicate statistical conditions for de-anonymizing data released by Netflix. Motwani and Ying warn about potential privacy breaches being enabled by publication of large volumes of government and business data containing quasi-identifiers.


See also

*
De-identification De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data ...
*
Differential privacy Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is t ...
*
Personally identifying information Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely accepted in the United States, but the phrase it abbreviates ha ...


References

Personal life Information privacy {{statistics-stub