WebFountain is an
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
analytical engine
The Analytical Engine was a proposed mechanical general-purpose computer designed by English mathematician and computer pioneer Charles Babbage. It was first described in 1837 as the successor to Babbage's difference engine, which was a des ...
implemented by
IBM for the study of
unstructured data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
on the
World Wide Web
The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet.
Documents and downloadable media are made available to the network through web se ...
. IBM describes WebFountain as:
. . . a set of research technologies that collect, store and analyze massive amounts of unstructured and semi-structured text. It is built on an open, extensible platform that enables the discovery of trends, patterns and relationships from data.
The project represents one of the first comprehensive attempts to catalog and interpret the
unstructured data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
of the Web in a continuous fashion. To this end its supporting researchers at IBM have investigated new systems for the precise retrieval of subsets of the information on the Web, real-time trend analysis, and meta-level analysis of the available information of the Web.
Factiva
Factiva is a business information and research tool owned by Dow Jones & Company. Factiva aggregates content from both licensed and free sources. Providing organizations with search, alerting, dissemination, and other information management ...
, an information retrieval company owned by
Dow Jones Dow Jones is a combination of the names of business partners Charles Dow and Edward Jones.
Dow Jones & Company
Dow, Jones and Charles Bergstresser founded Dow Jones & Company in 1882. That company eventually became a subsidiary of News Corp, and ...
and
Reuters
Reuters ( ) is a news agency owned by Thomson Reuters Corporation. It employs around 2,500 journalists and 600 photojournalists in about 200 locations worldwide. Reuters is one of the largest news agencies in the world.
The agency was estab ...
, licensed WebFountain in September 2003, and has been building software which utilizes the WebFountain engine to gauge corporate reputation. Factiva reportedly offers yearly subscriptions to the service for $200,000. Factiva has since decided to explore other technologies, and has severed its relationship with WebFountain.
WebFountain is developed at IBM's
Almaden research campus in the
Bay Area
The San Francisco Bay Area, often referred to as simply the Bay Area, is a populous region surrounding the San Francisco, San Pablo, and Suisun Bay estuaries in Northern California. The Bay Area is defined by the Association of Bay Area Gov ...
of
California
California is a U.S. state, state in the Western United States, located along the West Coast of the United States, Pacific Coast. With nearly 39.2million residents across a total area of approximately , it is the List of states and territori ...
.
IBM has developed software, called UIMA for Unstructured Information Management Architecture, that can be used for analysis of unstructured information. It can perhaps help perform trend analysis across documents, determine the theme and gist of documents, allow fuzzy searches on unstructured documents.
IBM Open Sources WebFountain (UIMA)
. IBM Open Sources WebFountain (UIMA) – Unstructured Text Analysis software.
References
External links
WebFountain overview
WebFountain
on John Battelle's Searchblog
Zdnet article
"Drinking from the Fire Hydrant"
* IBM sets out to make sense of the Web, February 5, 2004
IBM Joins Corporate Monitoring Space with Release of Public Image Monitoring Solution
Search Engine Watch, November 9, 2005
WebFountain
{{comp-sci-stub