Local Differential Privacy
   HOME

TheInfoList



OR:

Local differential privacy (LDP) is a model of
differential privacy Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while ...
with the added requirement that if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn much of the user's
personal data Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely used in the United States, but the phrase it abbreviates has fou ...
. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data. Local differential privacy (LDP) is an approach to mitigate the concern of data fusion and analysis techniques used to expose individuals to attacks and disclosures. LDP is a well-known privacy model for
distributed architecture Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commun ...
s that aims to provide privacy guarantees for each user while collecting and analyzing data, protecting from privacy leaks for the client and server. LDP has been widely adopted to alleviate contemporary privacy concerns in the era of
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
.


History

The
randomized response Randomised response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 1965 and later modified by B. G. Greenberg and coauthors in 1969. It allows respondents to respond to sensitive issues (such as c ...
survey technique proposed by Stanley L. Warner in 1965 is frequently cited as an example of local differential privacy. Warner's innovation was the introduction of what could now be called the “untrusted curator” model, where the entity collecting the data may not be trustworthy. Before users' responses are sent to the curator, the answers are randomized in a controlled manner, guaranteeing differential privacy while still allowing valid population-wide statistical inferences. In 2003, Alexandre V. Evfimievski,
Johannes Gehrke Johannes Gehrke is a Technical Fellow at Microsoft focusing on AI. He is an ACM Fellow, an IEEE Fellow, and he received the 2011 IEEE Computer Society Technical Achievement Award and the 2021 ACM SIGKDD Innovation Award. From 1999 to 2015, he wa ...
, and
Ramakrishnan Srikant Ramakrishnan Srikant is a Google Fellow at Google. His primary field of research is Data Mining. His 1994 paper, "Fast algorithms for mining association rules", co-authored with Rakesh Agrawal has acquired over 27000 citations as per Google S ...
gave a definition equivalent to local differential privacy. In 2008, Kasiviswanathan et al. first used the term "local private learning" and showed it to be equivalent to randomized response.


Applications

The era of
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
exhibits a high demand for
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
services that provide privacy protection for users. Demand for such services has pushed research into algorithmic paradigms that provably satisfy specific privacy requirements.


Anomaly Detection

Anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of ...
is formally defined as the process of identifying unexpected items or events in data sets. The rise of
social networking A social network is a social structure consisting of a set of social actors (such as individuals or organizations), networks of Dyad (sociology), dyadic ties, and other Social relation, social interactions between actors. The social network per ...
in the current era has led to many potential concerns related to
information privacy Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as dat ...
. As more and more users rely on social networks, they are often threatened by privacy breaches, unauthorized access to
personal information Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely used in the United States, but the phrase it abbreviates has fou ...
, and leakage of sensitive data. To attempt to solve this issue, the authors of "Anomaly Detection over Differential Preserved Privacy in Online Social Networks" have proposed a model using a social network utilizing restricted local differential privacy. By using this model, it aims for improved privacy preservation through anomaly detection. In this paper, the authors propose a privacy preserving model that sanitizes the collection of user information from a social network utilizing restricted local differential privacy (LDP) to save synthetic copies of collected data. This model uses reconstructed data to classify user activity and detect abnormal network behavior. The experimental results demonstrate that the proposed method achieves high data utility on the basis of improved privacy preservation. Furthermore, local differential privacy sanitized data are suitable for use in subsequent analyses, such as anomaly detection. Anomaly detection on the proposed method’s reconstructed data achieves a detection accuracy similar to that on the original data.


Blockchain Technology

Potential combinations of
blockchain The blockchain is a distributed ledger with growing lists of Record (computer science), records (''blocks'') that are securely linked together via Cryptographic hash function, cryptographic hashes. Each block contains a cryptographic hash of th ...
technology with local differential privacy have received research attention. Blockchains implement distributed, secured, and shared ledgers used to record and track data within a decentralized network, and they have successfully replaced certain prior systems of economic transactions within and between organizations. Increased usage of blockchains has raised some questions regarding privacy and security of data they store, and local differential privacy of various kinds has been proposed as a desirable property for blockchains containing sensitive data.


Context-Free Privacy

Local differential privacy provides context-free privacy even in the absence of a trusted data collector, though often at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. A context-aware framework of local differential privacy can allow a privacy designer to incorporate the application’s context into the privacy definition. For binary data domains, algorithmic research has provided a universally optimal privatization scheme and highlighted its connections to Warner’s randomized response (RR) and Mangat’s improved response. For k-ary data domains, motivated by
geolocation Geopositioning is the process of determining or estimating the geographic position of an object or a person. Geopositioning yields a set of Geographic coordinate system, geographic coordinates (such as latitude and longitude) in a given map datum ...
and web search applications, researchers have considered at least two special cases of context-aware LDP: block-structured LDP and high-low LDP (the latter is also defined in ). The research has provided communication-efficient, sample-optimal schemes and information theoretic lower bounds for both models.


Facial Recognition

Facial recognition Facial recognition or face recognition may refer to: *Face detection, often a step done before facial recognition *Face perception, the process by which the human brain understands and interprets the face *Pareidolia, which involves, in part, seein ...
has become widespread in recent years. Recent
smartphones A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as mult ...
, for example, utilize facial recognition to unlock the users phone as well as authorize the payment with their credit card. Though this is convenient, it poses privacy concerns. It is a resource-intensive task that often involves third party users, often resulting in a gap where the user’s privacy could be compromised.
Biometric Biometrics are body measurements and calculations related to human characteristics and features. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used t ...
information delivered to untrusted third-party servers in an uncontrolled manner can constitute a significant privacy leak as biometrics can be correlated with sensitive data such as healthcare or financial records. In Chamikara's academic article, he proposes a privacy-preserving technique for “controlled information release”, where they disguise an original face image and prevent leakage of the biometric features while identifying a person. He introduces a new privacy-preserving face recognition protocol named PEEP (Privacy using
Eigenface An eigenface ( ) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and ...
Perturbation) that utilizes local differential privacy. PEEP applies perturbation to Eigenfaces utilizing differential privacy and stores only the perturbed data in the third-party servers to run a standard Eigenface recognition algorithm. As a result, the trained model will not be vulnerable to privacy attacks such as membership inference and model memorization attacks. This model provided by Chami kara shows the potential solution of this issue or privacy leaks.


Federated Learning (FL)

Federated learning Federated learning (also known as collaborative learning) is a machine learning technique in a setting where multiple entities (often called clients) collaboratively train a model while keeping their data Decentralized computing, decentralized, ra ...
has the ambition to protect data privacy through distributed learning methods that keep the data in its storage. Likewise, differential privacy (DP) attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack a unified vision of these techniques, and a methodological workflow that supports their usage. In the study sponsored by the Andalusian Research Institute in Data Science and computational Intelligence, they developed a Sherpa.ai FL, 1,2 which is an open-research unified FL and DP framework that aims to foster the research and development of AI services at the edges and to preserve data privacy. The characteristics of FL and DP tested and summarized in the study suggests that they make them good candidates to support AI services at the edges and to preserve data privacy through their finding that by setting the value of \epsilon for lower values would guarantee higher privacy at the cost of lower accuracy.


Health Data Aggregation

The rapid growth of health data, the limited storage and computation resources of wireless body area sensor networks is becoming a barrier to the development of the
health industry The healthcare industry (also called the medical industry or health economy) is an aggregation and integration of sectors within the economic system that provides goods and services to treat patients with curative, preventive, rehabilitative, ...
to keep up. Aiming to solve this, the
outsourcing Outsourcing is a business practice in which companies use external providers to carry out business processes that would otherwise be handled internally. Outsourcing sometimes involves transferring employees and assets from one firm to another ...
of encrypted health data to the cloud has been an appealing strategy. However, there may come potential downsides as do all choices.
Data aggregation Data aggregation is the compiling of information from databases with intent to prepare combined datasets for data processing. Description The United States Geological Survey explains that, “when data are well documented, you know how and where to ...
will become more difficult and more vulnerable to data branches of this sensitive information of the patients of the healthcare industry. In his academic article, "Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees," Hao Ren and his team proposes a privacy enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. This aggregation function is designed to protect the aggregated data from
cloud server A virtual private server (VPS) or virtual dedicated server (VDS) is a virtual machine sold as a service by an Internet hosting company. A virtual private server runs its own copy of an operating system (OS), and customers may have superuser-le ...
s. The performance and evaluation done in their study shows that the proposal leads to less communication overhead than the existing data aggregation models currently in place.


Internet Connected Vehicles

A growing number of vehicles contain internet connection for the convenience of the users. This poses yet another threat to the user's privacy. The
Internet of vehicles Internet of vehicles (IoV) is a network of vehicles equipped with sensors, software, and the technologies that mediate between these with the aim of connecting & exchanging data over the Internet according to agreed standards. IoV evolved from Veh ...
(IoV) is expected to enable intelligent
traffic management Traffic management is a key branch within logistics. It concerns the planning, control and purchasing of transport services needed to physically move vehicles (for example aircraft, road vehicles, rolling stock and watercraft) and freight. Tr ...
, intelligent dynamic information services, intelligent vehicle control, etc. However, vehicles’ data privacy is argued to be a major barrier toward the application and development of IoV, thus causing a wide range of attention. Local differential privacy (LDP) is the relaxed version of the privacy standard, differential privacy, and it can protect users’ data privacy against the untrusted third party in the worst adversarial setting. The computational costs of using LDP is one concern among researchers as it is quite expensive to implement for such a specific model given that the model needs high mobility and short connection times. Furthermore, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amounts of communication cost. To avoid the privacy threat and reduce the communication cost, researchers propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.


Phone Blacklisting

Since spam phone calls are a growing nuisance in the digital world, researchers have been looking at potential solutions in minimizing this issue. Federal agencies such as the US
Federal Trade Commission The Federal Trade Commission (FTC) is an independent agency of the United States government whose principal mission is the enforcement of civil (non-criminal) United States antitrust law, antitrust law and the promotion of consumer protection. It ...
(FTC) have been working with telephone carriers to design systems for blocking
robocall A robocall is a phone call that uses a computerized autodialer to deliver a pre-recorded message, as if from a robot. Robocalls are often associated with political and telemarketing phone campaigns, but can also be used for public service, emerge ...
s. Furthermore, a number of commercial and smartphone apps that promise to block spam phone calls have been created, but they come with a subtle cost. The user’s privacy information that comes with giving the app the access to block spam calls may be leaked without the user’s consent or knowledge of it even occurring. In the study, the researchers analyze the challenges and trade-offs related to using local differential privacy, evaluate the LDP-based system on real-world user-reported call records collected by the FTC, and show that it is possible to learn a phone blacklist using a reasonable overall privacy budget and at the same time preserve users’ privacy while maintaining utility for the learned blacklist.


Trajectory Cross-Correlation Constraint

Aiming to solve the problem of low data utilization and privacy protection, a personalized differential privacy protection method based on
cross-correlation In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used f ...
constraints is proposed by researcher Hu. By protecting sensitive location points on the trajectory and the sensitive points, this extended differential privacy protection model combines the sensitivity of the user’s trajectory location and user privacy protection requirements and privacy budget. Using autocorrelation Laplace transform, specific white noise is transformed into noise that is related to the user's real trajectory sequence in both time and space. This noise data is used to find the cross-correlation constraint mechanics of the trajectory sequence in the model. By proposing this model, the researcher Hu's personalized differential privacy protection method is broken down and addresses the issue of adding independent and uncorrelated noise and the same degree of scrambling results in low privacy protection and poor data availability.


ε-local differential privacy


Definition of ε-local differential privacy

Let ε be a positive
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
and \mathcal be a
randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
that takes a user's private data as input. Let \textrm \mathcal denote the
image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...
of \mathcal. The algorithm \mathcal is said to provide \varepsilon-local differential privacy if, for all pairs of users' possible private data x and x^\prime and all subsets S of \textrm \mathcal: where the probability is taken over the
random measure In probability theory, a random measure is a measure-valued random element. Random measures are for example used in the theory of random processes, where they form many important point processes such as Poisson point processes and Cox processes. ...
implicit in the algorithm. The main difference between this definition of local differential privacy and the definition of standard (global) differential privacy is that in standard differential privacy the probabilities are of the outputs of an algorithm that takes all users' data and here it is on an algorithm that takes a single user's data. Other formal definitions of local differential privacy concern algorithms that categorize all users' data as input and output a collection of all responses (such as the definition in Raef Bassily,
Kobbi Nissim Kobbi Nissim (קובי נסים) is a computer scientist at Georgetown University, where he is the McDevitt Chair of Computer Science. His areas of research include cryptography and data privacy. He is known for the introduction of differential p ...
, Uri Stemmer and Abhradeep Guha Thakurta's 2017 paper).


Deployment

Algorithms guaranteeing local differential privacy have been deployed in several internet companies:


References

{{reflist Theory of cryptography Information privacy Differential privacy