A vertical search engine is distinct from a general
web search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the
Library of Congress
The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
,
Mocavo
Findmypast is a UK-based online genealogy service owned, since 2007, by British company DC Thomson. The website hosts billions of searchable records of census, directory and historical record information. It originated in 1965 when a group of ge ...
,
Nuroa,
Trulia, and
Yelp
Yelp Inc. is an American company that develops the Yelp.com website and the Yelp mobile app, which publishes crowd-sourced reviews about businesses. It also operates Yelp Guest Manager, a table reservation service. It is headquartered in S ...
.
In contrast to general web search engines, which attempt to
index
Index (: indexes or indices) may refer to:
Arts, entertainment, and media Fictional entities
* Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index''
* The Index, an item on the Halo Array in the ...
large portions of the
World Wide Web
The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
using a
web crawler
Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
, vertical search engines typically use a
focused crawler which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine.
Benefits
Vertical search offers several potential benefits over general search engines:
* Greater precision due to limited scope,
* Leverage domain knowledge including
taxonomies
image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy
Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
and
ontologies
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
,
* Support of specific unique user tasks.
Vertical search can be viewed as similar to
enterprise search
Enterprise search is software technology for searching data sources internal to a company, typically intranet and database content. The search is generally offered only to users internal to the company. Enterprise search can be contrasted with web ...
where the domain of focus is the enterprise, such as a company, government or other organization. In 2013, consumer price comparison websites with integrated vertical search engines such as
FindTheBest drew large rounds of venture capital funding, indicating a growth trend for these applications of vertical search technology.
Domain-specific search
Domain-specific verticals focus on a specific topic.
John Battelle
John Linwood Battelle (born November 4, 1965) is an entrepreneur, author and journalist. Best known for his work creating media properties, Battelle helped launch ''Wired'' in the 1990s and launched '' The Industry Standard ''during the dot-com ...
describes this in his book ''The Search'' (2005):
Domain-specific search solutions focus on one area of knowledge, creating customized search experiences, that because of the domain's limited corpus and clear relationships between concepts, provide extremely relevant results for searchers.
Any general search engine would be indexing all the pages and searches in a breadth-first manner to collect documents. The spidering in domain-specific search engines more efficiently searches a small subset of documents by focusing on a particular set. Spidering accomplished with a reinforcement-learning framework has been found to be three times more efficient than
breadth-first search
Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next dept ...
.
DARPA's Memex program
In early 2014, the Defense Advanced Research Projects Agency (
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
) released a statement on their website outlining the preliminary details of the "Memex program", which aims at developing new search technologies overcoming some limitations of text-based search.
DARPA wants the Memex technology developed in this research to be usable for search engines that can search for information on the
Deep Web
The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the " surface web", which is accessible to anyone using the Internet. Co ...
– the part of the Internet that is largely unreachable by commercial search engines like
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
or
Yahoo
Yahoo (, styled yahoo''!'' in its logo) is an American web portal that provides the search engine Yahoo Search and related services including My Yahoo, Yahoo Mail, Yahoo News, Yahoo Finance, Yahoo Sports, y!entertainment, yahoo!life, an ...
. DARPA's website describes that "The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests". As reported in a 2015 ''
Wired
Wired may refer to:
Arts, entertainment, and media Music
* ''Wired'' (Jeff Beck album), 1976
* ''Wired'' (Hugh Cornwell album), 1993
* ''Wired'' (Mallory Knox album), 2017
* "Wired", a song by Prism from their album '' Beat Street''
* "Wired ...
'' article, the search technology being developed in the Memex program "aims to shine a light on the
dark web
The dark web is the World Wide Web content that exists on darknets ( overlay networks) that use the Internet but require specific software, configurations, or authorization to access. Through the dark web, private computer networks can communica ...
and uncover patterns and relationships in online data to help law enforcement and others track illegal activity". DARPA intends for the program to replace the centralized procedures used by commercial search engines, stating that the "creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content".
In their description of the program, DARPA explains the program's name as a tribute to Bush's original Memex invention, which served as an inspiration.
In April 2015, it was announced parts of Memex would be open sourced.
Modules were available for download.
References
{{Reflist
Internet search engines