An audio search engine is a web-based

search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...

which crawls the web for

audio Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to: Sound *Audio signal, an electrical representation of sound *Audio frequency, a frequency in the audio spectrum *Digital audio, representation of sound ...

content. The information can consist of web pages, images, audio files, or another type of document. Various techniques exist for research on these engines.

Types of search

Audio search from text

Text entered into a search bar by the user is compared to the search engine's database. Matching results are accompanied by a brief description of the audio file and its characteristics such as sample frequency, bit rate, type of file, length, duration, or coding type. The user is given the option of downloading the resulting files.

Audio search from image

The

Query by Example Query by Example (QBE) is a database query language for relational databases. It was devised by Moshé M. Zloof at IBM Research during the mid-1970s, in parallel to the development of SQL. It is the first graphical query language, using visual ...

(QBE) system is a searching algorithm that uses

content-based image retrieval Content-based image retrieval, also known as query by image content ( QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching ...

(CBIR). Keywords are generated from the analysed image. These keywords are used to search for audio files in the database. The results of the search are displayed according to the user preferences regarding to the type of file (wav, mp3, aiff…) or other characteristics. espectro

Audio search from audio

In audio search from audio, the user must play the audio of a song either with a music player, by singing or by humming to the computer microphone. Subsequently, a sound pattern, A, is derived from the audio waveform, and a frequency representation is derived from its

Fourier Transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...

. This pattern will be matched with a pattern, B, corresponding to the waveform and transform of sound files found in the database. All those audio files in the database whose patterns are similar to the pattern search will be displayed as search results

Design and algorithms

Audio search has evolved slowly through several basic search formats which exist today and all use keywords. The keywords for each search can be found in the title of the media, any text attached to the media and content linked web pages, also defined by authors and users of video hosted resources. Some search engines can search recorded speech such as podcasts, though this can be difficult if there is background noise. Around 40

phonemes In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west o ...

exist in every language with about 400 in all spoken languages. Rather than applying a text search algorithm after speech-to-text processing is completed, some engines use a phonetic search algorithm to find results within the spoken word. Others work by listening to the entire podcast and creating a text transcription. Applications as Munax, use several independent ranking algorithms processes, that the

inverted index In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of d ...

together with hundreds of search parameters to produce the final ranking for each document. Also like Shazam that works by analyzing the captured sound and seeking a match based on an

acoustic fingerprint An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database. Practical uses of aco ...

in a database of more than 11 million songs. Shazam identifies songs based on an audio fingerprint based on a time-frequency graph called a

spectrogram A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...

. Shazam stores a catalogue of audio fingerprints in a database. The user tags a song for 10 seconds and the application creates an audio fingerprint. Once it creates the fingerprint of the audio, Shazam starts the search for matches in the database. If there is a match, it returns the information to the user; otherwise it returns a "song not known" dialogue. Shazam can identify prerecorded music being broadcast from any source, such as a radio, television, cinema or music in a club, provided that the background noise level is not high enough to prevent an acoustic fingerprint being taken, and that the song is present in the software's database.

Notable engines

Deep audio search

Picsearch Picsearch was a Swedish company which developed and provided image search services for large websites. The image search services developed by Picsearch power several major Internet companies, such as Lycos. Other Picsearch customers include region ...

Audio Search has been licensed to search portals since 2006. Picsearch is a search technology provider who powers image, video and audio search for over 100 major search engines around the world.

For smartphones

SoundHound SoundHound Inc. is an audio and speech recognition company founded in 2005. It develops speech recognition, natural language understanding, sound recognition and search technologies. Its featured products include Houndify, a Voice AI developer p ...

(previously known as ''Midomi'') is a software and company (both with the same name) that lets users find results with audio. Its features are both an audio-based

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...

service and services to find songs and details about them by

singing Singing is the act of creating musical sounds with the voice. A person who sings is called a singer, artist or vocalist (in jazz and/or popular music). Singers perform music (arias, recitatives, songs, etc.) that can be sung with or without ...

humming A hum is a sound made by producing a wordless tone with the mouth closed, forcing the sound to emerge from the nose. To hum is to produce such a sound, often with a melody. It is also associated with thoughtful absorption, 'hmm'. A hum has a ...

or recording them. * Shazam is an app for smartphone or Mac best known for its music identification capabilities. It uses a built-in microphone to gather a brief sample of the audio being played. It creates an

based on the sample, and compares it against a central database for a match. If it finds a match, it sends information such as the artist, song title, and album back to the user. * Doreso identifies a song by humming or singing the melody using a microphone; and by direct input of the name of a song or singer. The app gives information about the song title, its singer and allows you to purchase the song. *

Munax Munax was a Swedish company that developed a Large Hyper-Parallel Execution (LHPE) search engine system Munax XE. Munax XE, is an all-content search engine and powered nationwide and worldwide public search engines with page, document, audio, vide ...

(defunct) is a company that released their all-content search engine in its first version in 2005. Their PlayAudioVideo multimedia search engine, created in July 2007, was the first true search engine for multimedia, providing search on the web for images, video and audio in the same search engine, and allowing users to preview them on the same page. Munax has since shut down.

References

{{Search engines Internet search engines Audio software