In
network theory
In mathematics, computer science, and network science, network theory is a part of graph theory. It defines networks as Graph (discrete mathematics), graphs where the vertices or edges possess attributes. Network theory analyses these networks ...
, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a
social network
A social network is a social structure consisting of a set of social actors (such as individuals or organizations), networks of Dyad (sociology), dyadic ties, and other Social relation, social interactions between actors. The social network per ...
, predicting co-authorship links in a
citation network, and predicting interactions between genes and proteins in a
biological network
A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typ ...
. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time
, the goal is to predict the links at time
.
Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict
protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.
[
]
Problem definition
Consider a network , where represents the entity nodes in the network and x represents the set of "true" links across entities in the network.
We are given the set of entities and a subset of true links which are referred to as ''observed links''.
The goal of link prediction is to identify the unobserved true links.
In the temporal formulation of link prediction the observed links correspond to true links at a time , and the goal is to infer the set of true links at time
Usually, we are also given a subset of unobserved links called ''potential links'' , and we need to identify true links among these potential links.
In the binary classification formulation of the link prediction task the potential links are classified as either true links or false links.
Link prediction approaches for this setting learn a classifier that maps links in to positive and negative labels i.e. .
In the probability estimation formulation, potential links are associated with existence probabilities.
Link prediction approaches for this setting learn a model that maps links in to a probability i.e.