LabelMe is a project created by the

MIT Computer Science and Artificial Intelligence Laboratory Computer Science and Artificial Intelligence Laboratory (CSAIL) is a research institute at the Massachusetts Institute of Technology (MIT) formed by the 2003 merger of the Laboratory for Computer Science (LCS) and the Artificial Intelligence Lab ...

(CSAIL) which provides a

dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...

digital images A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...

with

annotations An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...

. The dataset is dynamic, free to use, and open to public contribution. The most applicable use of LabelMe is in

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

research. As of October 31, 2010, LabelMe has 187,240 images, 62,197 annotated images, and 658,992 labeled objects.

Motivation

The motivation behind creating LabelMe comes from the history of publicly available data for computer vision researchers. Most available data was tailored to a specific research group's problems and caused new researchers to have to collect additional data to solve their own problems. LabelMe was created to solve several common shortcomings of available data. The following is a list of qualities that distinguish LabelMe from previous work. * Designed for

recognition Recognition may refer to: *Award, something given in recognition of an achievement Machine learning *Pattern recognition, a branch of machine learning which encompasses the meanings below Biometric * Recognition of human individuals, or biomet ...

of a class of objects instead of single instances of an object. For example, a traditional dataset may have contained images of dogs, each of the same size and orientation. In contrast, LabelMe contains images of dogs in multiple angles, sizes, and orientations. * Designed for recognizing objects embedded in arbitrary scenes instead of images that are

cropped Cropping is the removal of unwanted outer areas from a photographic or illustrated image. The process usually consists of the removal of some of the peripheral areas of an image to remove extraneous trash from the picture, to improve its framin ...

, normalized, and/or resized to display a single object. * Complex annotation: Instead of labeling an entire image (which also limits each image to containing a single object), LabelMe allows annotation of multiple objects within an image by specifying a

polygon In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two toge ...

bounding box that contains the object. * Contains a large number of object classes and allows the creation of new classes easily. * Diverse images: LabelMe contains images from many different scenes. * Provides non-

copyright A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, education ...

ed images and allows public additions to the annotations. This creates a free environment.

Annotation Tool

The LabelMe annotation tool provides a means for users to contribute to the project. The tool can be accessed anonymously or by logging into a free account. To access the tool, users must have a compatible

web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...

with

JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...

support. When the tool is loaded, it chooses a random image from the LabelMe dataset and displays it on the screen. If the image already has object labels associated with it, they will be overlaid on top of the image in polygon format. Each distinct object label is displayed in a different color. If the image is not completely labeled, the user can use the

mouse A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...

to draw a polygon containing an object in the image. For example, in the adjacent image, if a person was standing in front of the building, the user could click on a point on the border of the person, and continue clicking along the outside edge until returning to the starting point. Once the polygon is closed, a bubble pops up on the screen which allows the user to enter a label for the object. The user can choose whatever label the user thinks best describes the object. If the user disagrees with the previous labeling of the image, the user can click on the outline polygon of an object and either delete the polygon completely or edit the text label to give it a new name. As soon as changes are made to the image by the user, they are saved and openly available for anyone to download from the LabelMe dataset. In this way, the data is always changing due to contributions by the community of users who use the tool. Once the user is finished with an image, the ''Show me another image'' link can be clicked and another random image will be selected to display to the user.

Problems with the data

The LabelMe dataset has some problems. Some are inherent in the data, such as the objects in the images not being uniformly distributed with respect to size and image location. This is due to the images being primarily taken by humans who tend to focus the camera on interesting objects in a scene. However, cropping and rescaling the images randomly can simulate a uniform distribution. Other problems are caused by the amount of freedom given to the users of the annotation tool. Some problems that arise are: * The user can choose which objects in the scene to outline. Should an occluded person be labeled? Should an occluded part of an object be included when outlining the object? Should the sky be labeled? * The user has to describe the shape of the object themselves by outlining a polygon. Should the fingers of a hand on a person be outlined with detail? How much precision must be used when outlining objects? * The user chooses what text to enter as the label for the object. Should the label be ''person'', ''man'', or ''pedestrian''? The creators of LabelMe decided to leave these decisions up to the annotator. The reason for this is that they believe people will tend to annotate the images according to what they think is the natural labeling of the images. This also provides some variability in the data, which can help researchers tune their

algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...

to account for this variability.

Extending the data

Using WordNet

Since the text labels for objects provided in LabelMe come from user input, there is a lot of variation in the labels used (as described above). Because of this, analysis of objects can be difficult. For example, a picture of a dog might be labeled as ''dog'', ''canine'', ''hound'', ''pooch'', or ''animal''. Ideally, when using the data, the object class ''dog'' at the abstract level should incorporate all of these text labels.

WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short definition ...

is a database of words organized into a structural way. It allows assigning a word to a category, or in WordNet language: a sense. Sense assignment is not easy to do automatically. When the authors of LabelMe tried automatic sense assignment, they found that it was prone to a high rate of error, so instead they assigned words to senses manually. At first, this may seem like a daunting task since new labels are added to the LabelMe project continuously. To the right is a graph comparing the growth of polygons to the growth of words (descriptions). As you can see, the growth of words is small compared with the continuous growth of polygons, and therefore is easy enough to keep up to date manually by the LabelMe team. Once WordNet assignment is done, searches in the LabelMe database are much more effective. For example, a search for ''animal'' might bring up pictures of ''dogs'', ''cats'' and ''snakes''. However, since the assignment was done manually, a picture of a computer mouse labeled as ''mouse'' would not show up in a search for ''animals''. Also, if objects are labeled with more complex terms like ''dog walking'', WordNet still allows the search of ''dog'' to return these objects as results. WordNet makes the LabelMe database much more useful.

Object-part hierarchy

Having a large dataset of objects where overlap is allowed provides enough data to try and categorize objects as being a part of another object. For example, most of the labels assigned ''wheel'' are probably part of objects assigned to other labels like ''car'' or ''bicycle''. These are called part labels. To determine if label P is a part label for label O: * Let

\mathrm_\mathrm\,

denote the set of images containing an object (e.g. car) * Let

\mathrm_\mathrm\,

denote the set of images containing a part (e.g. wheel) * Let the overlap score between object O and part P,

\mathrm_\,

, be defined as the ratio of the intersection area to the area of the part polygon. (e.g.

\frac\,

) * Let

\mathrm_ \subseteq \mathrm_\mathrm\,

denote the images where object and part polygons have

\mathrm_ > \beta\,

where

\beta\,

is some threshold value. The authors of LabelMe use

\beta=0.5\,

* The object-part score for a candidate label is

\frac\,

where

\mathrm_\,

and

\mathrm_\mathrm\,

are the number of images in

\mathrm_\,

and

\mathrm_\mathrm\,

, respectively, and

\alpha\,

is a concentration parameter. The authors of LabelMe use

\alpha=5\,

. This algorithm allows the automatic classification of parts of an object when the part objects are frequently contained within the outer object.

Object depth ordering

Another instance of object overlap is when one object is actually on top of the other. For example, an image might contain a person standing in front of a building. The person is not a part label as above since the person is not part of the building. Instead, they are two separate objects that happen to overlap. To automatically determine which object is the foreground and which is the background, the authors of LabelMe propose several options: * If an object is completely contained within another object, then the inner object must be in the foreground. Otherwise, it would not be visible in the image. The only exception is with transparent or translucent objects, but these occur rarely. * One of the objects could be labeled as something that cannot be in the foreground. Examples are ''sky'', ''ground'', or ''road''. * The object with more polygon points inside the intersecting area is most likely the foreground. The authors tested this hypothesis and found it to be highly accurate. * Histogram intersection can be used. To do this, a

color histogram In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges, ...

in the intersecting areas is compared to the color histogram of the two objects. The object with the closer color histogram is assigned as the foreground. This method is less accurate than counting the polygon points.

Matlab Toolbox

The LabelMe project provides a set of tools for using the LabelMe dataset from Matlab. Since research is often done in Matlab, this allows the integration of the dataset with existing tools in computer vision. The entire dataset can be downloaded and used offline, or the toolbox allows dynamic downloading of content on demand.

References

;Bibliography * *{{cite journal , doi=10.1007/BF00130487, title=Color indexing, year=1991, last1=Swain, first1=Michael J., last2=Ballard, first2=Dana H., journal=International Journal of Computer Vision, volume=7, pages=11–32, s2cid=8167136

External links

* http://labelme.csail.mit.edu/ - LabelMe - The open annotation tool Datasets in computer vision Object recognition and categorization