representation learning In machine learning (ML), feature learning or representation learning is a set of techniques that allow a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual fea ...

, knowledge graph embedding (KGE), also called knowledge representation learning (KRL), or multi-relation learning, is a

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

task of learning a low-dimensional representation of a

knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...

's entities and relations while preserving their

semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

Definition

A knowledge graph

\mathcal = \

is a collection of entities

E

, relations

R

, and facts

F

. A ''fact'' is a triple

(h, r, t) \in F

that denotes a link

r \in R

between the head

h \in E

and the tail

t \in E

of the triple. Another notation that is often used in the literature to represent a triple (or fact) is

. This notation is called resource description framework (RDF). A knowledge graph represents the knowledge related to a specific domain; leveraging this structured representation, it is possible to infer a piece of new knowledge from it after some refinement steps. However, nowadays, people have to deal with the sparsity of data and the computational inefficiency to use them in a real-world application. The embedding of a knowledge graph is a function that translates each entity and each relation into a vector of a given dimension

d

, called embedding dimension. It is even possible to embed the entities and relations with different dimensions. The embedding vectors can then be used for other tasks. A knowledge graph embedding is characterized by four aspects: # Representation space: The low-dimensional space in which the entities and relations are represented. #Scoring function: A measure of the goodness of a triple embedded representation. # Encoding models: The modality in which the embedded representation of the entities and relations interact with each other. # Additional information: Any additional information coming from the knowledge graph that can enrich the embedded representation. Usually, an ''ad hoc'' scoring function is integrated into the general scoring function for each additional information.

Embedding procedure

All algorithms for creating a knowledge graph embedding follow the same approach. First, the embedding vectors are initialized to random values. Then, they are iteratively optimized using a

training set In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...

of triples. In each iteration, a batch of size

b

triples is sampled from the training set, and a triple from it is sampled and corruptedi.e., a triple that does not represent a true fact in the knowledge graph. The corruption of a triple involves substituting the head or the tail (or both) of the triple with another entity that makes the fact false. The original triple and the corrupted triple are added in the training batch, and then the embeddings are updated, optimizing a scoring function. Iteration stops when a stop condition is reached. Usually, the stop condition depends on the

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

of the training set. At the end, the learned embeddings should have extracted semantic meaning from the training triples and should correctly predict unseen true facts in the knowledge graph.

Pseudocode

The following is the pseudocode for the general embedding procedure. algorithm Compute entity and relation embeddings input: The training set

S = \

, entity set

E

, relation set

R

, embedding dimension

k

output: Entity and relation embeddings ''initialization:'' ''the entities''

e

''and relations''

r

''embeddings (vectors) are randomly initialized'' while stop condition do

S_ \leftarrow sample(S, b)

// Sample a batch from the training set for each

(h, r, t)

S_

(h', r, t') \leftarrow sample(S')

// Sample a corrupted fact

T_ \leftarrow T_ \cup \

end for Update embeddings by minimizing the loss function end while

Performance indicators

These indexes are often used to measure the embedding quality of a model. The simplicity of the indexes makes them very suitable for evaluating the performance of an embedding algorithm even on a large scale. Given Q as the set of all ranked predictions of a model, it is possible to define three different performance indexes: Hits@K, MR, and MRR.

Hits@K

Hits@K or in short, H@K, is a performance index that measures the probability to find the correct prediction in the first top K model predictions. Usually, it is used

k=10

. Hits@K reflects the accuracy of an embedding model to predict the relation between two given triples correctly. Hits@K

= \frac \in

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

/math> Larger values mean better predictive performances.

Mean rank (MR)

Mean rank is the average ranking position of the items predicted by the model among all the possible items.

MR = \frac\sum_

The smaller the value, the better the model.

Mean reciprocal rank (MRR)

Mean reciprocal rank measures the number of triples predicted correctly. If the first predicted triple is correct, then 1 is added, if the second is correct

\frac

is summed, and so on. Mean reciprocal rank is generally used to quantify the effect of search algorithms.

MRR = \frac\sum_  \in

/math> The larger the index, the better the model.

Applications

Machine learning tasks

Knowledge graph completion (KGC) is a collection of techniques to infer knowledge from an embedded knowledge graph representation. In particular, this technique completes a triple inferring the missing entity or relation. The corresponding sub-tasks are named link or entity prediction (i.e., guessing an entity from the embedding given the other entity of the triple and the relation), and relation prediction (i.e., forecasting the most plausible relation that connects two entities). Triple Classification is a binary classification problem. Given a triple, the trained model evaluates the plausibility of the triple using the embedding to determine if a triple is true or false. The decision is made with the model score function and a given threshold. Clustering is another application that leverages the embedded representation of a sparse knowledge graph to condense the representation of similar semantic entities close in a 2D space.

Real world applications

The use of knowledge graph embedding is increasingly pervasive in many applications. In the case of

recommender system A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fi ...

s, the use of knowledge graph embedding can overcome the limitations of the usual

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

, as well as limitations of the conventional

collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...

method. Training this kind of recommender system requires a huge amount of information from the users; however, knowledge graph techniques can address this issue by using a graph already constructed over a prior knowledge of the item correlation and using the embedding to infer from it the recommendation.

Drug repurposing A drug is any chemical substance other than a nutrient or an essential dietary ingredient, which, when administered to a living organism, produces a biological effect. Consumption of drugs can be via inhalation, injection, smoking, ingestion, ...

is the use of an already approved drug, but for a therapeutic purpose different from the one for which it was initially designed. It is possible to use the task of link prediction to infer a new connection between an already existing drug and a disease by using a biomedical knowledge graph built leveraging the availability of massive literature and biomedical databases. Knowledge graph embedding can also be used in the domain of social politics.

Models

Given a collection of triples (or facts)

\mathcal = \

, the knowledge graph embedding model produces, for each entity and relation present in the knowledge graph a continuous vector representation.

(h, r, t)

is the corresponding embedding of a triple with

h,t \in ^

and

r \in ^

, where

d

is the embedding dimension for the entities, and

k

for the relations. The score function of a given model is denoted by

\mathcal_(h, t)

and measures the distance of the embedding of the head from the embedding of tail given the embedding of the relation. In other words, it quantifies the plausibility of the embedded representation of a given fact. Rossi et al. propose a taxonomy of the embedding models and identifies three main families of models: tensor decomposition models, geometric models, and deep learning models.

Tensor decomposition model

The tensor decomposition is a family of knowledge graph embedding models that use a multi-dimensional matrix to represent a knowledge graph, that is partially knowable due to gaps of the graph describing a particular domain thoroughly. In particular, these models use a third-order (3D)

tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space. Tensors may map between different objects such as vectors, scalars, and even other ...

, which is then factorized into low-dimensional vectors that are the embeddings. A third-order tensor is suitable for representing a knowledge graph because it records only the existence or absence of a relation between entities, and so is simple, and there is no need to know ''a priori'' the network structure, making this class of embedding models light, and easy to train even if they suffer from high-dimensionality and sparsity of data.

Bilinear models

This family of models uses a linear equation to embed the connection between the entities through a relation. In particular, the embedded representation of the relations is a bidimensional matrix. These models, during the embedding procedure, only use the single facts to compute the embedded representation and ignore the other associations to the same entity or relation. * DistMult: Since the embedding matrix of the relation is a diagonal matrix, the scoring function can not distinguish asymmetric facts. * ComplEx: As DistMult uses a diagonal matrix to represent the relations embedding but adds a representation in the

complex vector space In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called ''vectors'', can be added together and multiplied ("scaled") by numbers called ''scalars''. The operations of vector addition and sc ...

and the

hermitian product In mathematics, a sesquilinear form is a generalization of a bilinear form that, in turn, is a generalization of the concept of the dot product of Euclidean space. A bilinear form is linear in each of its arguments, but a sesquilinear form allows o ...

, it can distinguish symmetric and asymmetric facts. This approach is scalable to a large knowledge graph in terms of time and space cost. * ANALOGY: This model encodes in the embedding the analogical structure of the knowledge graph to simulate inductive reasoning. Using a differentiable objective function, ANALOGY has good theoretical generality and computational scalability. It is proven that the embedding produced by ANALOGY fully recovers the embedding of DistMult, ComplEx, and HolE. * SimplE: This model is the improvement of

canonical polyadic decomposition In multilinear algebra, the tensor rank decomposition or rank-''R'' decomposition is the decomposition of a tensor as a sum of ''R'' rank-1 tensors, where ''R'' is minimal. Computing this decomposition is an open problem. Canonical polyadic decom ...

(CP), in which an embedding vector for the relation and two independent embedding vectors for each entity are learned, depending on whether it is a head or a tail in the knowledge graph fact. SimplE resolves the problem of independent learning of the two entity embeddings using an inverse relation and average the CP score of

(h, r, t)

and

(t, r^, h)

. In this way, SimplE collects the relation between entities while they appear in the role of subject or object inside a fact, and it is able to embed asymmetric relations.

Non-bilinear models

* HolE: HolE uses circular correlation to create an embedded representation of the knowledge graph, which can be seen as a compression of the matrix product, but is more computationally efficient and scalable while keeping the capabilities to express asymmetric relation since the circular correlation is not commutative. HolE links holographic and complex embeddings since, if used together with Fourier, can be seen as a special case of ComplEx. * TuckER: TuckER sees the knowledge graph as a tensor that could be decomposed using the

Tucker decomposition In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker although it goes back to Hitchcock in 1927. Initially described as a three-mode extension of factor an ...

in a collection of vectorsi.e., the embeddings of entities and relationswith a shared core. The weights of the core tensor are learned together with the embeddings and represent the level of interaction of the entries. Each entity and relation has its own embedding dimension, and the size of the core tensor is determined by the shape of the entities and relations that interact. The embedding of the subject and object of a fact are summed in the same way, making TuckER fully expressive, and other embedding models such as RESCAL, DistMult, ComplEx, and SimplE can be expressed as a special formulation of TuckER. * MEI: MEI introduces the multi-partition embedding interaction technique with the block term tensor format, which is a generalization of CP decomposition and Tucker decomposition. It divides the embedding vector into multiple partitions and learns the local interaction patterns from data instead of using fixed special patterns as in ComplEx or SimplE models. This enables MEI to achieve optimal efficiency—expressiveness trade-off, not just being fully expressive. Previous models such as TuckER, RESCAL, DistMult, ComplEx, and SimplE are suboptimal restricted special cases of MEI. * MEIM: MEIM goes beyond the block term tensor format to introduce the independent core tensor for ensemble boosting effects and the soft orthogonality for max-rank relational mapping, in addition to multi-partition embedding interaction. MEIM generalizes several previous models such as MEI and its subsumed models, RotaE, and QuatE. MEIM improves expressiveness while still being highly efficient in practice, helping it achieve good results using fairly small model sizes.

Geometric models

The geometric space defined by this family of models encodes the relation as a geometric transformation between the head and tail of a fact. For this reason, to compute the embedding of the tail, it is necessary to apply a transformation

\tau

to the head embedding, and a distance function

\delta

is used to measure the goodness of the embedding or to score the reliability of a fact.

\mathcal_(h, t) = \delta(\tau(h, r), t)

Geometric models are similar to the tensor decomposition model, but the main difference between the two is that they have to preserve the applicability of the transformation

\tau

in the geometric space in which it is defined.

Pure translational models

This class of models is inspired by the idea of translation invariance introduced in

word2vec Word2vec is a technique in natural language processing (NLP) for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these rep ...

. A pure translational model relies on the fact that the embedding vector of the entities are close to each other after applying a proper relational translation in the geometric space in which they are defined. In other words, given a fact, the embedding of the head plus the embedding of the relation should equal the embedding of the tail. The closeness of the entities embedding is given by some distance measure and quantifies the reliability of a fact. * TransE: Uses a scoring function that forces the embeddings to satisfy a simple

vector sum In mathematics, physics, and engineering, a Euclidean vector or simply a vector (sometimes called a geometric vector or spatial vector) is a geometric object that has magnitude (or length) and direction. Euclidean vectors can be added and scal ...

equation in each fact in which they appear:

h + r = t

. The embedding will be exact if each entity and relation appears in only one fact, and so in practice is poor at representing one-to-many,

many-to-one In systems analysis, a one-to-many relationship is a type of cardinality (data modeling), cardinality that refers to the relationship between two wikt:entity, entities (see also entity–relationship model). For example, take a car and an owner of ...

, and asymmetric relations. * TransH: A modification of TransE for representing types of relations, by using a

hyperplane In geometry, a hyperplane is a generalization of a two-dimensional plane in three-dimensional space to mathematical spaces of arbitrary dimension. Like a plane in space, a hyperplane is a flat hypersurface, a subspace whose dimension is ...

as a geometric space. In TransH, the relation embedding is on a different hyperplane depending on the entities it interacts with. So, to compute, for example, the score function of a fact, the embedded representation of the head and tail need to be projected using a relational projection matrix on the correct hyperplane of the relation. * TransR: A modification of TransH that uses different spaces embedding entities versus relations, thus separating the semantic spaces of entities and relations. TransR also uses a relational projection matrix to translate the embedding of the entities to the relation space. *TransD: In TransR, the head and the tail of a given fact could belong to two different types of entities. For example, in the fact

(Obama, president\_of, USA)

, ''Obama'' is a person and ''USA'' is a country. Matrix multiplication is an expensive procedure in TransR to compute the projection. In this context, TransD uses two vectors for each entity-relation pair to compute a dynamic mapping that substitutes the projection matrix while reducing the dimensional complexity. The first vector is used to represent the semantic meaning of the entities and relations, the second to compute the mapping matrix. * TransA: All the translational models define a score function in their representation space, but they oversimplify this metric loss. Since the vector representation of the entities and relations is not perfect, a pure translation of

h + r

could be distant from

t

, and a

spherical A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...

equipotential

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...

makes it hard to distinguish which is the closest entity. TransA, instead, introduces an adaptive

Mahalanobis distance The Mahalanobis distance is a distance measure, measure of the distance between a point P and a probability distribution D, introduced by Prasanta Chandra Mahalanobis, P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance ...

to weights the embedding dimensions, together with elliptical surfaces to remove the ambiguity.

Translational models with additional embeddings

It is possible to associate additional information to each element in the knowledge graph and their common representation facts. Each entity and relation can be enriched with text descriptions, weights, constraints, and others in order to improve the overall description of the domain with a knowledge graph. During the embedding of the knowledge graph, this information can be used to learn specialized embeddings for these characteristics together with the usual embedded representation of entities and relations, with the cost of learning a more significant number of vectors. * STransE: This model is the result of the combination of TransE and of the structure embedding in such a way it is able to better represent the one-to-many, many-to-one, and

many-to-many Many-to-many communication occurs when information is shared between groups. Members of a group receive information from multiple senders. Wikis are a type of many-to-many communication, where multiple editors collaborate to create content that is ...

relations. To do so, the model involves two additional independent matrix

W_^

and

W_^

for each embedded relation

r

in the KG. Each additional matrix is used based on the fact the specific relation interact with the head or the tail of the fact. In other words, given a fact

(h, r, t)

, before applying the vector translation, the head

h

is multiplied by

W_^

and the tail is multiplied by

W_^

. * CrossE'': ''Crossover interactions can be used for related information selection, and could be very useful for the embedding procedure. Crossover interactions provide two distinct contributions in the information selection: interactions from relations to entities and interactions from entities to relations. This means that a relation, e.g.'president_of' automatically selects the types of entities that are connecting the subject to the object of a fact. In a similar way, the entity of a fact inderectly determine which is inference path that has to be choose to predict the object of a related triple. CrossE, to do so, learns an additional interaction matrix

C

, uses the element-wise product to compute the interaction between

h

and

r

. Even if, CrossE, does not rely on a neural network architecture, it is shown that this methodology can be encoded in such architecture.

Roto-translational models

This family of models, in addition or in substitution of a translation they employ a rotation-like transformation. * TorusE: The regularization term of TransE makes the entity embedding to build a spheric space, and consequently loses the translation properties of the geometric space. To address this problem, TorusE leverages the use of a compact

Lie group In mathematics, a Lie group (pronounced ) is a group (mathematics), group that is also a differentiable manifold, such that group multiplication and taking inverses are both differentiable. A manifold is a space that locally resembles Eucli ...

that in this specific case is n-dimensional

torus In geometry, a torus (: tori or toruses) is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanarity, coplanar with the circle. The main types of toruses inclu ...

space, and avoid the use of regularization. TorusE defines the distance functions to substitute the L1 and L2 norm of TransE. * RotatE: RotatE is inspired by the

Euler's identity In mathematics, Euler's identity (also known as Euler's equation) is the Equality (mathematics), equality e^ + 1 = 0 where :e is E (mathematical constant), Euler's number, the base of natural logarithms, :i is the imaginary unit, which by definit ...

and involves the use of Hadamard product to represent a relation

r

as a rotation from the head

h

to the tail

t

in the complex space. For each element of the triple, the complex part of the embedding describes a counterclockwise rotation respect to an axis, that can be describe with the Euler's identity, whereas the modulus of the relation vector is 1. It is shown that the model is capable of embedding symmetric, asymmetric, inversion, and composition relations from the knowledge graph.

Deep learning models

This group of embedding models uses

deep neural network Deep learning is a subset of machine learning that focuses on utilizing multilayered neural network (machine learning), neural networks to perform tasks such as Statistical classification, classification, Regression analysis, regression, and re ...

to learn patterns from the knowledge graph that are the input data. These models have the generality to distinguish the type of entity and relation, temporal information, path information, underlay structured information, and resolve the limitations of distance-based and semantic-matching-based models in representing all the features of a knowledge graph. The use of deep learning for knowledge graph embedding has shown good predictive performance even if they are more expensive in the training phase, data-hungry, and often required a pre-trained embedding representation of knowledge graph coming from a different embedding model.

Convolutional neural networks

This family of models, instead of using fully connected layers, employs one or more

convolutional layer In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neu ...

s that convolve the input data applying a low-dimensional filter capable of embedding complex structures with few parameters by learning nonlinear features. * ConvE: ConvE is an embedding model that represents a good tradeoff expressiveness of deep learning models and computational expensiveness, in fact it is shown that it used 8x less parameters, when compared to DistMult. ConvE uses a one-dimensional

d

-sized embedding to represent the entities and relations of a knowledge graph. To compute the score function of a triple, ConvE apply a simple procedure: first concatenes and merge the embeddings of the head of the triple and the relation in a single data ; \mathcal/chem>, then this matrix is used as input for the 2D convolutional layer. The result is then passed through a dense layer that apply a linear transformation parameterized by the matrix

\mathcal

and at the end, with the

inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, ofte ...

is linked to the tail triple. ConvE is also particularly efficient in the evaluation procedure: using a 1-N scoring, the model matches, given a head and a relation, all the tails at the same time, saving a lot of evaluation time when compared to the 1-1 evaluation program of the other models. * ConvR: ConvR is an adaptive convolutional network aimed to deeply represent all the possible interactions between the entities and the relations. For this task, ConvR, computes convolutional filter for each relation, and, when required, applies these filters to the entity of interest to extract convoluted features. The procedure to compute the score of triple is the same as ConvE. * ConvKB: ConvKB, to compute score function of a given triple

(h, r, t)

, it produces an input ; \mathcal; t/chem>of dimension

d \times 3

without reshaping and passes it to series of convolutional filter of size

1 \times 3

. This result feeds a dense layer with only one neuron that produces the final score. The single final neuron makes this architecture as a binary classifier in which the fact could be true or false. A difference with ConvE is that the dimensionality of the entities is not changed.

Capsule neural networks

This family of models uses

capsule neural network A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organi ...

s to create a more stable representation that is able to recognize a feature in the input without losing spatial information. The network is composed of convolutional layers, but they are organized in capsules, and the overall result of a capsule is sent to a higher-capsule decided by a dynamic process routine. * CapsE: CapsE implements a capsule network to model a fact

(h, r, t)

. As in ConvKB, each triple element is concatenated to build a matrix ; \mathcal; t/chem>and is used to feed to a convolutional layer to extract the convolutional features. These features are then redirected to a capsule to produce a continuous vector, more the vector is long, more the fact is true.

Recurrent neural networks

This class of models leverages the use of

recurrent neural network Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...

. The advantage of this architecture is to memorize a sequence of fact, rather than just elaborate single events. * RSN: During the embedding procedure is commonly assumed that, similar entities has similar relations. In practice, this type of information is not leveraged, because the embedding is computed just on the undergoing fact rather than a history of facts. Recurrent skipping networks (RSN) uses a recurrent neural network to learn relational path using a random walk sampling.

Model performance

The machine learning task for knowledge graph embedding that is more often used to evaluate the embedding accuracy of the models is the link prediction. Rossi et al. produced an extensive benchmark of the models, but also other surveys produces similar results. The

benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Experimental benchmarking, the act of defining a ...

involves five datasets FB15k, WN18, FB15k-237, WN18RR, and YAGO3-10. More recently, it has been discussed that these datasets are far away from real-world applications, and other datasets should be integrated as a standard benchmark.

Libraries

* * * * * * * * * * * *

References

External links

{{Scholia, topic
Open Graph Benchmark - StanfordWordNet - Princeton
Knowledge graphs Machine learning Graph algorithms Information science