Scholia comparison page for seven people named Li Li as of 2019-12-02 (cropped)

Author name disambiguation is a type of

disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consc ...

and

record linkage Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and d ...

applied to the names of individual people. The process could, for example, distinguish individuals with the name " John Smith". An editor may apply the process to scholarly documents where the goal is to find all mentions of the same author and cluster them together. Authors of scholarly documents often share names which makes it hard to distinguish each author's work. Hence, author name disambiguation aims to find all publications that belong to a given author and distinguish them from publications of other authors who share the same name.

Methods

Considerable research has been conducted to do disambiguation. Typical approaches for author name disambiguation rely on information about the authors such as their affiliations, email addresses, year of publication, co-authors, topic information to distinguish between authors. This information can be used to train a

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

classifier to decide whether two author mentions refer to the same author or not. Many research works regard name disambiguation as a clustering problem, i.e., partitioning documents into some clusters, where each represents an author. Others regard it as a classification problem. Some works construct document graph and utilize the graph topology to learn document similarity. Recently, several research works aim to learn low-dimensional document representation by employing network embedding methods.

Applications

Scholia page for missing information related to an author profile as of 2019-12-02 at 20

There are multiple reasons that cause author names to be ambiguous, among which: individuals may publish under multiple names for a variety of reasons including different transliteration, misspelling, name change due to marriage, or the use of nicknames or middle names and initials. Motivations for disambiguating individuals include identifying inventors from patents. Name disambiguation is also a cornerstone in author-centric academic search and mining systems, such as

ArnetMiner AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data. Overview AMiner (ArnetMiner) is designed to search and perform data mining operations against academic publications on the Internet, using ...

(also AMiner) .

Similar issues

Author name disambiguation is only one record linkage problem in the scholarly data domain. Closely related, and potentially mutually beneficial problems include: organisation (affiliation) disambiguation, as well as conference or publication venue disambiguation, since data publishers often use different names or aliases for these entities.

Resources

Several well-known benchmarks to evaluate author name disambiguation are listed below, each of which provides publications with some ambiguous names and their ground truths.
AMiner name disambiguation dataset

CiteSeerX name disambiguation dataset

Semantic Scholar Author Name Disambiguation (S2AND) dataset
ref> Source Codes
Beard

Name disambiguation in AMiner
ref name="zhang2018name"/>

References

{{reflist Word-sense disambiguation Library cataloging and classification Metadata Data management