HOME

TheInfoList



OR:

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
s, so that the people whom the data describe remain
anonymous Anonymous may refer to: * Anonymity, the state of an individual's identity, or personally identifiable information, being publicly unknown ** Anonymous work, a work of art or literature that has an unnamed or unknown creator or author * Anonym ...
.


Overview

Data anonymization has been defined as a "process by which personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." Data anonymization may enable the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization. In the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, address, and full postcode must be removed, together with any other information which, in conjunction with other data held by or disclosed to the recipient, could identify the patient. There will always be a risk that anonymized data may not stay anonymous over time. Pairing the anonymized dataset with other data, clever techniques and raw power are some of the ways previously anonymous data sets have become de-anonymized; The data subjects are no longer anonymous. De-anonymization is the reverse process in which anonymous data is cross-referenced with other data sources to re-identify the anonymous data source. Generalization and perturbation are the two popular anonymization approaches for relational data. The process of obscuring data with the ability to re-identify it later is also called pseudonymization and is one way companies can store data in a way that is HIPAA compliant. However, according to ARTICLE 29 DATA PROTECTION WORKING PARTY, Directive 95/46/EC refers to anonymisation in Recital 26 "signifies that to anonymise any data, the data must be stripped of sufficient elements such that the data subject can no longer be identified. More precisely, that data must be processed in such a way that it can no longer be used to identify a natural person by using “all the means likely reasonably to be used” by either the controller or a third party. An important factor is that the processing must be irreversible. The Directive does not clarify how such a de-identification process should or could be performed. The focus is on the outcome: that data should be such as not to allow the data subject to be identified via “all” “likely” and “reasonable” means. Reference is made to codes of conduct as a tool to set out possible anonymisation mechanisms as well as retention in a form in which identification of the data subject is “no longer possible”. There are five types of data anonymization operations: generalization, suppression, anatomization, permutation, and perturbation. Text was copied from this source, which is available under
Creative Commons Attribution 4.0 International License


GDPR requirements

The
European Union The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
's
General Data Protection Regulation The General Data Protection Regulation (Regulation (EU) 2016/679), abbreviated GDPR, is a European Union regulation on information privacy in the European Union (EU) and the European Economic Area (EEA). The GDPR is an important component of ...
(GDPR) requires that stored data on people in the EU undergo either anonymization or a pseudonymization process. GDPR Recital (26) establishes a very high bar for what constitutes anonymous data, thereby exempting the data from the requirements of the GDPR, namely “…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” The European Data Protection Supervisor (EDPS) and the Spanish Agencia Española de Protección de Datos (AEPD) have issued joint guidance related to requirements for anonymity and exemption from GDPR requirements. According to the EDPS and AEPD, no one, including the data controller, should be able to re-identify data subjects in a properly anonymized dataset. Research by data scientists at Imperial College in London and UCLouvain in Belgium, as well as a ruling by Judge Michal Agmon-Gonen of the Tel Aviv District Court, highlight the shortcomings of "Anonymisation" in today's
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
world. Anonymisation reflects an outdated approach to data protection that was developed when the processing of data was limited to isolated (siloed) applications, prior to the popularity of big data processing involving the widespread sharing and combining of data.


Anonymization of different types of data

Structured data: *
Databases In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
Unstructured data: *
PDF files Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
- Anonymization of text, tables, images, scanned pages. * DICOM - Anonymization metadata, pixel data, overlay data, encapsulated documents. * Images Removing identifying
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
from
computer files A computer file is a resource for recording data on a computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a computer file. Files can be shared with and transferred b ...
is important for anonymizing them.
Metadata removal tool Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mai ...
s are useful for achieving this.


See also

*
Anonymity Anonymity describes situations where the acting person's identity is unknown. Anonymity may be created unintentionally through the loss of identifying information due to the passage of time or a destructive event, or intentionally if a person cho ...
* De-anonymization *
De-identification De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data ...
*
Differential privacy Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while ...
*
Fillet (redaction) To fillet in the sense of literary editing is a form of censorship or redaction effected by "cutting out" central letters of a word or name, as if the skeleton of a fish, and replacing them with dashes, to prevent full disclosure (e.g. ' for " Wi ...
* Geo-Blocking *
k-anonymity ''k''-anonymity is a property possessed by certain anonymized data. The term ''k''-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalen ...
*
l-diversity ''l''-diversity, also written as ''ℓ''-diversity, is a form of group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss ...
*
Masking and unmasking by intelligence agencies Unmasking by U.S. intelligence agencies typically occurs after the United States conducts eavesdropping or other intelligence gathering aimed at foreigners or foreign agents, and the name of a U.S. citizen or entity is incidentally collected. In ...
*
Metadata removal tool Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mai ...
* Pseudonymization * Statistical disclosure control


References


Further reading

* * * * * {{cite web, last=Pete Warden, title=Why you can't really anonymize your data, url=http://strata.oreilly.com/2011/05/anonymize-data-limits.html, publisher=O'Reilly Media, Inc., access-date=17 January 2014, archive-url=https://web.archive.org/web/20140109052803/http://strata.oreilly.com/2011/05/anonymize-data-limits.html, archive-date=9 January 2014, url-status=dead


External links

* On the anonymization of Internet traffic
Data Sharing and Anonymization Reading List
Information privacy Data protection Anonymity