HOME

TheInfoList



OR:

SpeechBot was a
web search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
for
streaming media Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
content developed at
Compaq Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
's (later HP) research laboratories in Cambridge, MA and
Australia Australia, officially the Commonwealth of Australia, is a country comprising mainland Australia, the mainland of the Australia (continent), Australian continent, the island of Tasmania and list of islands of Australia, numerous smaller isl ...
. Compaq launched the website at Streaming Media West 1999 in San Jose, CA. The internet radio shows indexed by SpeechBot included The Motley Fool, Fresh Air, Talk of the Nation, The Dr. Laura Program, and Dreamland with Art Bell. By June 2003, the service had indexed over 17,000 hours of multimedia content. The website was taken offline in 2005, after HP closed their Cambridge research lab. The SpeechBot indexing workflow involved a farm of
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
workstations that retrieved the streaming content; and a
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
cluster running speech recognition to transcribe the spoken audio. The
web server A web server is computer software and underlying Computer hardware, hardware that accepts requests via Hypertext Transfer Protocol, HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, co ...
, search index and metadata library were hosted on AlphaServers running
Tru64 UNIX Tru64 UNIX is a discontinued 64-bit UNIX operating system for the DEC Alpha, Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corp ...
. If transcripts were already available, then these were aligned to the audio stream; otherwise, an approximate transcript was produced using speech recognition. The Calista recognizer that was used was derived from Sphinx-3. Due to the low quality of streaming audio at the time, the word error rate was quite high, but most searches were still able to retrieve relevant hits. The search results linked to the offset in the stream that corresponded to the search phrase, so that users did not need to listen to the entire program to find the section of interest.


References


Further reading

* * * * * * * * Hewlett-Packard Defunct internet search engines 1999 software Internet properties established in 1999 Internet properties disestablished in 2005 {{web-software-stub