HOME

TheInfoList



OR:

SpeechBot was a
web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
for
streaming media Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content i ...
content developed at
Compaq Compaq Computer Corporation (sometimes abbreviated to CQ prior to a 2007 rebranding) was an American information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced ...
's (later HP) research laboratories in
Cambridge, MA Cambridge ( ) is a city in Middlesex County, Massachusetts, United States. As part of the Boston metropolitan area, the cities population of the 2020 U.S. census was 118,403, making it the fourth most populous city in the state, behind Boston, ...
and
Australia Australia, officially the Commonwealth of Australia, is a sovereign country comprising the mainland of the Australian continent, the island of Tasmania, and numerous smaller islands. With an area of , Australia is the largest country by ...
. Compaq launched the website at Streaming Media West 1999 in San Jose, CA. The internet radio shows indexed by SpeechBot included
The Motley Fool The Motley Fool is a private financial and investing advice company based in Alexandria, Virginia. It was founded in July 1993 by co-chairmen and brothers David Gardner and Tom Gardner, and Erik Rydholm, who has since left the company. The compa ...
,
Fresh Air ''Fresh Air'' is an American radio talk show broadcast on National Public Radio stations across the United States since 1985. It is produced by WHYY-FM in Philadelphia, Pennsylvania. The show's host is Terry Gross. , the show was syndicated to ...
,
Talk of the Nation ''Talk of the Nation'' (''TOTN'') is an American talk radio program based in Washington D.C., produced by National Public Radio ( NPR) that was broadcast nationally from 2 to 4 p.m. Eastern Time. It focused on current events and controversial i ...
, The Dr. Laura Program, and Dreamland with
Art Bell Arthur William Bell III (June 17, 1945 – April 13, 2018) was an American broadcaster and author. He was the founder and the original host of the paranormal-themed radio program ''Coast to Coast AM'', which is syndicated on hundreds of ...
. By June 2003, the service had indexed over 17,000 hours of multimedia content. The website was taken offline in 2005, after HP closed their Cambridge research lab. The SpeechBot indexing
workflow A workflow consists of an orchestrated and repeatable pattern of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence o ...
involved a farm of
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
workstations that retrieved the streaming content; and a
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study th ...
running
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
to transcribe the spoken audio. The
web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initia ...
, search index and metadata library were hosted on
AlphaServer AlphaServer is a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and later by Compaq and HP. AlphaServers were based on the DEC Alpha 64-bit microprocessor. Supported operating systems for Alpha ...
s running
Tru64 UNIX Tru64 UNIX is a discontinued 64-bit UNIX operating system for the Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation (DE ...
. If transcripts were already available, then these were aligned to the audio stream; otherwise, an approximate transcript was produced using speech recognition. The Calista recognizer that was used was derived from Sphinx-3. Due to the low quality of streaming audio at the time, the
word error rate Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from ...
was quite high, but most searches were still able to retrieve relevant hits. The search results linked to the offset in the stream that corresponded to the search phrase, so that users did not need to listen to the entire program to find the section of interest.


References


Further reading

* * * * * * * * Hewlett-Packard Defunct internet search engines 1999 software Internet properties established in 1999 Internet properties disestablished in 2005 {{web-software-stub