network theory Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defi ...

, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed

features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...

, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables.

Inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...

is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes (ex. gender, age, political affiliation) of individuals in a

social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for an ...

, classifying webpages in the

World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...

, and inferring the research area of a paper in a scientific publication dataset.

Motivation and background

Traditionally, a major focus of machine learning is to solve

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

problems. (For example, given a collection of e-mails, we wish to determine which are

spam Spam may refer to: * Spam (food), a canned pork meat product * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ** Messaging spam, spam targeting users of instant messaging ( ...

, and which are not.) Many machine learning models for performing this task will try to categorize each item independently, and focus on predicting the class labels separately. However, the prediction accuracy for the labels whose values must be inferred can be improved with knowledge of the correct class labels for related items. For example, it is easier to predict the topic of a webpage if we know the topics of the webpages that link to it. Similarly, the chance of a particular word being a verb increases if we know that the previous word in the sentence is a noun; knowing the first few characters in a word can make it much easier to identify the remaining characters. Many researchers have proposed techniques that attempt to classify samples in a joint or collective manner, instead of treating each sample in isolation; these techniques have enabled significant gains in classification accuracy.

Example

Consider the task of inferring the political affiliation of users in a social network, where some portion of these affiliations are observed, and the remainder are unobserved. Each user has local features, such as their profile information, and links exist between users who are friends in this social network. An approach that does not collectively classify users will consider each user in the network independently and use their local features to infer party affiliations. An approach which performs collective classification might assume that users who are friends tend to have similar political views, and could then jointly infer all unobserved party affiliations while making use of the rich relational structure of the social network.

Definition

Consider the semi supervised learning problem of assigning labels to nodes in a network by using knowledge of a subset of the nodes' labels. Specifically, we are given a network represented by a graph

G

with a set of nodes

V

and an edge set

E

representing relationships among nodes. Each node

v_i\in V

is described by its attributes: a feature vector

x_i \in X

and its label (or class)

y_i\in Y

V

can further be divided into two sets of nodes:

L

, the set of nodes for which we know the correct label values (observed variables), and

U

, the nodes whose labels must be inferred. The collective classification task is to label the nodes in

U

with a label from a label set

L=\

. In such settings, traditional classification algorithms assume that the data is drawn independently and identically from some distribution (iid). This means that the labels inferred for nodes whose label is unobserved are independent of each other. One does not make this assumption when performing collective classification. Instead, there are three distinct types of correlations that can be utilized to determine the classification or label of

v

: # The correlations between the label of

v

and the observed attributes of

v

. Traditional iid classifiers which make use of feature vectors are an example of approaches that use this correlation. # The correlations between the label of

v

and the observed attributes (including observed labels) of nodes in the neighborhood of

v

. # The correlations between the label of

v

and the unobserved labels of objects in the neighborhood of

v

. Collective classification refers to the combined classification of a set of interlinked objects using the three above types of information.

Methods

There are several existing approaches to collective classification. The two major methods are iterative methods and methods based on probabilistic graphical models.

Iterative methods

The general idea for iterative methods is to iteratively combine and revise individual node predictions so as to reach an equilibrium. When updating predictions for individual nodes is a fast operation, the complexity of these iterative methods will be the number of iterations needed for convergence. Though convergence and optimality is not always mathematically guaranteed, in practice, these approaches will typically converge quickly to a good solution, depending on the graph structure and problem complexity. The methods presented in this section are representative of this iterative approach.

Label propagation

A natural assumption in network classification is that adjacent nodes are likely to have the same label (i.e., contagion or

homophily Homophily () is a concept in sociology describing the tendency of individuals to associate and bond with similar others, as in the proverb "". The presence of homophily has been discovered in a vast array of network studies: over have observed ...

). The predictor for node

V_

using the label propagation method is a weighted average of its neighboring labels

Y_

Iterative Classification Algorithms (ICA)

While label propagation is surprisingly effective, it may sometimes fail to capture complex relational dynamics. More sophisticated approaches can use richer predictors. Suppose we have a classifier

h

that has been trained to classify a node

v_i

given its features

X_i

and the features

X_

and labels

Y_

of its neighbors

N_i

. Iterative classification applies uses a local classifier for each node, which uses information about current predictions and ground truth information about the node's neighbors, and iterates until the local predictions converge to a global solution. Iterative classification is an “algorithmic framework,” in that it is agnostic to the choice of predictor; this makes it a very versatile tool for collective classification.

Collective classification with graphical models

Another approach to collective classification is to represent the problem with a graphical model and use learning and inference techniques for the graphical modeling approach to arrive at the correct classifications. Graphical models are tools for joint, probabilistic inference, making them ideal for collective classification. They are characterized by a graphical representation of a probability distribution

P

, in which random variables are nodes in a graph

G

. Graphical models can be broadly categorized by whether the underlying graph is directed (e.g.,

Bayesian networks A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...

or collections of local classifiers) or undirected (e.g.,

Markov random fields In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to ...

(MRF)).

Gibbs sampling

Gibbs sampling is a general framework for approximating a distribution. It is a

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

algorithm, in that it iteratively samples from the current estimate of the distribution, constructing a Markov chain that converges to the target (stationary) distribution. The basic idea for Gibbs Sampling is to sample for the best label estimate for

y_i

given all the values for the nodes in

N_i

using local classifier

f

for a fixed number of iterations. After that, we sample labels for each

y_i\in Y

and maintain count statistics for the number of times we sampled label

l

for node

y_i

. After collecting a predefined number of such samples, we output the best label assignment for node

y_i

by choosing the label that was assigned the maximum number of times to

y_i

while collecting samples.

Loopy belief propagation

For certain undirected graphical models, it is possible to efficiently perform exact inference via message passing, or belief propagation algorithms. These algorithms follow a simple iterative pattern: each variable passes its "beliefs" about its neighbors' marginal distributions, then uses the incoming messages about its own value to update its beliefs. Convergence to the true marginals is guaranteed for tree-structured MRFs, but is not guaranteed for MRFs with cycles.

Statistical relational learning (SRL) related

Statistical relational learning Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty (which can be dealt with using statistical methods) and complex, relational ...

is often used to address collective classification problems. A variety of SRL methods has been applied to the collective classification setting. Some of the methods include direct methods such probabilistic relational models (PRM), coupled conditional models such as link-based classification, and indirect methods such as Markov logic networks (MLN) and

Probabilistic Soft Logic Probabilistic Soft Logic (PSL) is a statistical relational learning (SRL) framework for modeling probabilistic and relational domains. It is applicable to a variety of machine learning problems, such as collective classification, entity reso ...

(PSL).

Applications

Collective classification is applied in many domains which exhibit relational structure, such as: * Social network analysis, where collective approaches to node classification tasks such as detecting malicious users can utilize information about relationships between nodes. *

Entity resolution Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and da ...

, where one can make use of co-authorship relationships to identify authors of papers. * Named entity recognition, where some approaches treat this as a text sequence labeling problem and jointly infer the labels of every word in a sentence, typically by using a conditional random field which models a linear chain of dependencies between the labels of adjacent words in the sentence. * Document classification, where for example inter-document semantic similarities can be collectively utilized as signals that certain documents belong to the same class. *

Computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...

, where

graphical models ''Graphical Models'' is an academic journal in computer graphics and geometry processing publisher by Elsevier. , its editor-in-chief is Jorg Peters of the University of Florida. History This journal has gone through multiple names. Founded in 1 ...

such as

are utilized to jointly infer relations between biological entities such as genes. *

Computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, where for example collective classification can be applied to recognizing multiple objects simultaneously.

References

{{reflist Network theory