A vertical search engine is distinct from a general
web search engine
A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the
Library of Congress
The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The librar ...
,
Mocavo,
Nuroa,
Trulia, and
Yelp
Yelp Inc. is an American company that develops the Yelp.com website and the Yelp mobile app, which publish crowd-sourced reviews about businesses. It also operates Yelp Guest Manager, a table reservation service. It is headquartered in San F ...
.
In contrast to general web search engines, which attempt to
index
Index (or its plural form indices) may refer to:
Arts, entertainment, and media Fictional entities
* Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index''
* The Index, an item on a Halo megastru ...
large portions of the
World Wide Web
The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet.
Documents and downloadable media are made available to the network through web se ...
using a
web crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spi ...
, vertical search engines typically use a
focused crawler which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine.
Benefits
Vertical search offers several potential benefits over general search engines:
* Greater precision due to limited scope,
* Leverage domain knowledge including
taxonomies and
ontologies,
* Support of specific unique user tasks.
Vertical search can be viewed as similar to
enterprise search
Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.
"Enterprise search" is used to describe the software of search information within an ente ...
where the domain of focus is the enterprise, such as a company, government or other organization. In 2013, consumer price comparison websites with integrated vertical search engines such as
FindTheBest drew large rounds of venture capital funding, indicating a growth trend for these applications of vertical search technology.
Domain-specific search
Domain-specific verticals focus on a specific topic.
John Battelle describes this in his book ''The Search'' (2005):
Domain-specific search solutions focus on one area of knowledge, creating customized search experiences, that because of the domain's limited corpus and clear relationships between concepts, provide extremely relevant results for searchers.
Any general search engine would be indexing all the pages and searches in a breadth-first manner to collect documents. The spidering in domain-specific search engines more efficiently searches a small subset of documents by focusing on a particular set. Spidering accomplished with a reinforcement-learning framework has been found to be three times more efficient than
breadth-first search
Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next d ...
.
DARPA's Memex program
In early 2014, the Defense Advanced Research Projects Agency (
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military.
Originally known as the Ad ...
) released a statement on their website outlining the preliminary details of the "Memex program", which aims at developing new search technologies overcoming some limitations of text-based search.
DARPA wants the Memex technology developed in this research to be usable for search engines that can search for information on the
Deep Web – the part of the Internet that is largely unreachable by commercial search engines like
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
or
Yahoo
Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...
. DARPA's website describes that "The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests". As reported in a 2015 ''
Wired
''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San Fran ...
'' article, the search technology being developed in the Memex program "aims to shine a light on the
dark web
The dark web is the World Wide Web content that exists on '' darknets'': overlay networks that use the Internet but require specific software, configurations, or authorization to access. Through the dark web, private computer networks can commu ...
and uncover patterns and relationships in online data to help law enforcement and others track illegal activity". DARPA intends for the program to replace the centralized procedures used by commercial search engines, stating that the "creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content".
In their description of the program, DARPA explains the program's name as a tribute to Bush's original Memex invention, which served as an inspiration.
In April 2015, it was announced parts of Memex would be open sourced.
Modules were available for download.
References
{{Reflist
Internet search engines