machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, tree kernels are the application of the more general concept of

positive-definite kernel In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...

to tree structures. They find applications in

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...

, where they can be used for machine-learned

parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...

or classification of sentences.

Motivation

In natural language processing, it is often necessary to compare tree structures (e.g.

parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...

s) for similarity. Such comparisons can be performed by computing

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...

s of vectors of features of the trees, but these vectors tend to be very large: NLP techniques have come to a point where a simple dependency relation over two words is encoded with a vector of several millions of features. It can be impractical to represent complex structures such as trees with features vectors. Well-designed kernels allow computing similarity over trees without explicitly computing the feature vectors of these trees. Moreover,

kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...

have been widely used in machine learning tasks (e.g. SVM), and thus plenty of algorithms are working natively with kernels, or have an extension that handles

kernelization In computer science, a kernelization is a technique for designing efficient algorithms that achieve their efficiency by a preprocessing stage in which inputs to the algorithm are replaced by a smaller input, called a "kernel". The result of solvi ...

. An example application is classification of sentences, such as different types of questions.

Examples

Here are presented two examples of tree kernel applied to the constituency trees of the sentences "A cat eats a mouse." and "A mouse eats a cat.". In this example "A" and "a" are the same words, and in most of the NLP applications they would be represented with the same token. The interest of these two kernels is that they show very different granularity (the subset tree kernel being far more fine-grained than the subtree kernel), for the same computation complexity. Both can be computed recursively in time ''O(, T₁, ., T₂, )''.

Subtree kernel

In the case of constituency tree, a subtree is defined as a node and all its children (e.g., P [D [A [N [mouse">_[A.html" ;"title="P [D [A">P [D [A [N [mouse">_[A">P_[D_[A<_a>_[N_[mouse.html" ;"title="_[A.html" ;"title="P [D [A">P [D [A [N [mouse">_[A.html" ;"title="P [D [A">P [D [A [N [mouseis a subtree of the two trees). Terminals are not considered subtree (e.g. [a] is not a subtree). The subtree kernel count the number of common subtrees between two given trees. In this example, there are seven common subtrees: :[NP [D [a [N [cat], :[NP [D [a [mouse">ouse.html" ;"title=" [mouse"> [mouse : [mouse, :[N [cat, :[V [eats, :[D [a (counted twice as it appears twice).

Subset tree kernel

A subset tree is a more general structure than a subtree. The basic definition is the same, but in the case of subset trees, leaves need not be terminals (e.g., P [V[NP">.html" ;"title="P [V">P [V[NP is a subset tree of both trees), but here too single nodes are not considered as trees. Because of this more general definition, there are more subset trees than subtrees, and more common subset trees than common subtrees. In this example, there are 54 common subset trees. The seven common subtrees plus among others: :[NP [D] [N (counted twice), :[VP [V [eats [NP...

Notes

{{reflist

References

*Jun Sun, Min Zhang and Chew Lim Tan. ''Tree Sequence Kernel for Natural Language'' *Alessandro Moschitti. ''Making Tree Kernels practical for Natural Language Learning''

External links

http://disi.unitn.it/moschitti/Tree-Kernel.htm
-- Application of tree kernel to SVM, on Alessandro Moschitti web-page. Operator theory Hilbert spaces