computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, visual descriptors or image descriptors are descriptions of the visual features of the contents in

image An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...

video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) syste ...

s, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the

shape A shape or figure is a graphics, graphical representation of an object or its external boundary, outline, or external Surface (mathematics), surface, as opposed to other properties such as color, Surface texture, texture, or material type. A pl ...

, the

color Color (American English) or colour (British English) is the visual perceptual property deriving from the spectrum of light interacting with the photoreceptor cells of the eyes. Color categories and physical specifications of color are associ ...

, the texture or the

motion In physics, motion is the phenomenon in which an object changes its position with respect to time. Motion is mathematically described in terms of displacement, distance, velocity, acceleration, speed and frame of reference to an observer and mea ...

, among others.

Introduction

As a result of the new communication technologies and the massive use of

Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...

in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of

multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...

information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or

audio Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to: Sound *Audio signal, an electrical representation of sound *Audio frequency, a frequency in the audio spectrum *Digital audio, representation of sound ...

and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the

search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...

s for textual contents. Although it is certain, that it is relatively easy to find text with a computer, is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the

MPEG-7 MPEG-7 is a multimedia content description standard. It was standardized in ISO/ IEC 15938 (Multimedia content description interface). This description will be associated with the content itself, to allow fast and efficient searching for material th ...

(''Motion Picture Expert Group - 7'').

Types

Descriptors are the first step to find out the connection between pixels contained in a

digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions ...

and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: * General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. * Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be

face recognition A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wo ...

General information descriptors

General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...

Color

It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of

s: * ''Dominant color descriptor (DCD)'' * ''Scalable color descriptor (SCD)'' * ''Color structure descriptor (CSD)'' * ''

Color layout descriptor In digital image and video processing, a color layout descriptor (CLD) is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts: grid based representative color selection and discr ...

(CLD)'' * ''Group of frame (GoF)'' or ''group-of-pictures (GoP)''

Texture

It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the

histograms A histogram is an approximate representation of the frequency distribution, distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "Data binning, bin" (or "Data binning, buck ...

of these region borders. The set of descriptors is formed by: * ''

Homogeneous texture descriptor Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, s ...

(HTD)'' * ''Texture browsing descriptor (TBD) '' * ''Edge histogram descriptor (EHD)''

Shape

It contains important

semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...

information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The

descriptors are the following ones: * ''Region-based shape descriptor (RSD)'' * ''Contour-based shape descriptor (CSD)'' * ''3-D shape descriptor (3-D SD)''

Motion

It's defined by four different descriptors which describe motion in

sequence. Motion is related to the objects motion in the sequence and to the

camera A camera is an Optics, optical instrument that can capture an image. Most cameras can capture 2D images, with some more advanced models being able to capture 3D images. At a basic level, most cameras consist of sealed boxes (the camera body), ...

motion. This last information is provided by the capture device, whereas the rest is implemented by means of

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...

. The descriptor set is the following one: * ''Motion activity descriptor (MAD)'' * ''Camera motion descriptor (CMD)'' * ''Motion trajectory descriptor (MTD)'' * ''Warping and parametric motion descriptor (WMD and PMD)''

Location

Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: * ''Region locator descriptor (RLD)'' * ''Spatio temporal locator descriptor (STLD)''

Specific domain information descriptors

These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before,

is a concrete example of an application that tries to automatically obtain this information.

Descriptors applications

Among all applications, the most important ones are: *

Multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...

documents search engines and classifiers. *

Digital library A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital me ...

: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. * Personalized electronic news service. * Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. * Control and filtering of concrete audio-visual contents, like violent or pornographic material. Also, authorization for some

contents.

References

* B.S. Manjunath (Editor), Philippe Salembier (Editor), and Thomas Sikora (Editor): ''Introduction to MPEG-7: Multimedia Content Description Interface''. Wiley & Sons, April 2002 - {{ISBN, 0-471-48678-7 Computer vision