HOME

TheInfoList



A search engine is a
software system A software system is a system of intercommunicating software component, components based on forming part of a computer system (a combination of Computer hardware, hardware and software). It "consists of a number of separate Computer program, program ...
that is designed to carry out web searches. They search the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system An information system (IS) is a formal, sociotechnical Sociotechnical systems (STS) in organizational development is an approach to complex organizational w ...
in a systematic way for particular information specified in a textual
web search query A web search query is a query based on a specific search term that a user enters into a web search engine A search engine is a software system that is designed to carry out web searches (Internet searches), which means to search the World Wide ...
. The
search results Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine (computing), search engine in response t ...
are generally presented in a line of results, often referred to as
search engine results page Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine A search engine is a software syste ...
s (SERPs) The information may be a mix of links to
web page A web page (or webpage) is a hypertext File:Douglas Engelbart in 2008.jpg, Douglas Engelbart in 2009, at the 40th anniversary celebrations of "The Mother of All Demos" in San Francisco, a 90-minute 1968 presentation of the NLS (computer sy ...

web page
s, images, videos,
infographic Infographics (a clipped compound of "information" and "graphics") are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly.Doug Newsom and Jim Haynes (2004). ''Public Relations Writ ...

infographic
s, articles, research papers, and other types of files. Some search engines also mine data available in
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softw ...
s or open directories. Unlike
web directories A web directory or link directory is an online list or catalog of website A website (also written as web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web serv ...
, which are maintained only by human editors, search engines also maintain real-time information by running an
algorithm of an algorithm (Euclid's algorithm) for calculating the greatest common divisor (g.c.d.) of two numbers ''a'' and ''b'' in locations named A and B. The algorithm proceeds by successive subtractions in two loops: IF the test B ≥ A yields "yes" ...

algorithm
on a
web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot An Internet bot, web Web most often refers to: * Spider web A spider web, spiderweb, spider's web, or cobweb (from the archaic word ...
. Internet content that is not capable of being searched by a web search engine is generally described as the
deep web The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not Search engine indexing, indexed by standard web search engine, web search-engines. The opposite term to the deep web is the "surface web", which is ...
.


History


Pre-1990s

A system for locating published information intended to overcome the ever increasing difficulty of locating information in ever-growing centralized indices of scientific work was described in 1945 by
Vannevar Bush Vannevar Bush ( ; March 11, 1890 – June 28, 1974) was an American engineer, inventor and science administrator, who during World War II, World War II headed the U.S. Office of Scientific Research and Development (OSRD), through which almos ...

Vannevar Bush
, who wrote an article in
The Atlantic Monthly ''The Atlantic'' is an American magazine and multi-platform publisher. It was founded in 1857 in Boston, Massachusetts, as ''The Atlantic Monthly'', a literary and cultural magazine that published leading writers' commentary on education, the ...

The Atlantic Monthly
titled "
As We May Think "As We May Think" is a 1945 essay by Vannevar Bush Vannevar Bush ( ; March 11, 1890 – June 28, 1974) was an American engineer, inventor and science administrator, who during World War II headed the U.S. Office of Scientific Research ...
" in which he envisioned libraries of research with connected annotations not unlike modern
hyperlink In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and soft ...

hyperlink
s.
Link analysis In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and Financial transacti ...
would eventually become a crucial component of search engines through algorithms such as Hyper Search and
PageRank PageRank (PR) is an algorithm used by Google Search to rank webpages, web pages in their search engine results. PageRank is a way of measuring the importance of website pages. According to Google: Currently, PageRank is not the only algorithm used ...

PageRank
.


1990s: Birth of search engines

The first internet search engines predate the debut of the Web in December 1990:
WHOIS WHOIS (pronounced as the phrase "who is") is a query and response Communications protocol, protocol that is widely used for querying databases that store the registered users or assignees of an Internet resource, such as a domain name, an IP addres ...
user search dates back to 1982, and the
Knowbot Information Service The Knowbot Information Service (KIS), also known as netaddress, is an Internet user search engine that debuted in December 1989. Although it searched users, not content, it could be argued to be the first search engine on the Internet as it querie ...
multi-network user search was first implemented in 1989. The first well documented search engine that searched content files, namely
FTP The File Transfer Protocol (FTP) is a standard communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical qua ...

FTP
files, was Archie, which debuted on 10 September 1990. Prior to September 1993, the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system An information system (IS) is a formal, sociotechnical Sociotechnical systems (STS) in organizational development is an approach to complex organizational w ...
was entirely indexed by hand. There was a list of
webserver A web server is computer software and underlying hardware that accepts requests via Hypertext Transfer Protocol, HTTP, the network protocol created to distribute web pages, or its secure variant HTTPS. A user agent, commonly a web browser or web ...

webserver
s edited by
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web upright=1.35, A global map of the web index for countries in 2014 The World Wide W ...

Tim Berners-Lee
and hosted on the
CERN The European Organization for Nuclear Research (french: Organisation européenne pour la recherche nucléaire), known as CERN (; ; derived from the name ), is a European research organization that operates the largest particle physics laborato ...
webserver. One snapshot of the list in 1992 remains, but as more and more web servers went online the central list could no longer keep up. On the NCSA site, new servers were announced under the title "What's New!" The first tool used for searching content (as opposed to users) on the
Internet The Internet (Capitalization of Internet, or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''network of networks'' t ...

Internet
was Archie."Internet History - Search Engines" (from
Search Engine Watch Search Engine Watch (SEW) provides news and information about search engines and search engine marketing. Search Engine Watch was started by Danny Sullivan in 1996. In 1997, Sullivan sold it for an undisclosed amount to MecklerMedia (now WebM ...
), Universiteit Leiden, Netherlands, September 2001, web
LeidenU-Archie
The name stands for "archive" without the "v"., It was created by
Alan Emtage Alan Emtage (born November 27, 1964) is a Canadian computer scientist A computer scientist is a person A person (plural people or persons) is a being that has certain capacities or attributes such as reason, morality, consciousness or self-co ...
computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algorithm, algorithmic proc ...
student at
McGill University McGill University is a public university, public research university located in Montreal, Quebec, Canada. Founded in 1821 by royal charter granted by George IV, King George IV,Frost, Stanley Brice. ''McGill University, Vol. I. For the Advanceme ...
in
Montreal, Quebec Montreal ( ; officially Montréal, ) is the List of the largest municipalities in Canada by population, second-most populous city in Canada and List of towns in Quebec, most populous city in the Provinces and territories of Canada, Canadian prov ...
, Canada. The program downloaded the directory listings of all the files located on public anonymous FTP (
File Transfer Protocol The File Transfer Protocol (FTP) is a standard communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical qu ...
) sites, creating a searchable
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softw ...
of file names; however,
Archie Search Engine Archie is a tool for indexing File Transfer Protocol, FTP archives, allowing users to more easily identify specific files. It is considered the first Internet Search engine (computing), search engine. The original implementation was written in 1 ...
did not index the contents of these sites since the amount of data was so limited it could be readily searched manually. The rise of
Gopher Pocket gophers, commonly referred to as just gophers, are burrowing rodent Rodents (from Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originall ...
(created in 1991 by Mark McCahill at the
University of Minnesota The University of Minnesota, Twin Cities (the U of M or Minnesota) is a public university, public Land-grant university, land-grant research university in the Minneapolis–Saint Paul, Twin Cities of Minneapolis and Saint Paul, Minnesota. The Tw ...

University of Minnesota
) led to two new search programs,
VeronicaVeronica, Veronika, etc., may refer to: People * Veronica (name) Veronica (also spelled Weronika, Veronika, Verónica or Verônica) is a female given name, the Latin transliteration of the Greek language, Greek name Berenice, Βερενίκη, wh ...
and Jughead. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (''V''ery ''E''asy ''R''odent-''O''riented ''N''et-wide ''I''ndex to ''C''omputerized ''A''rchives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (''J''onzy's ''U''niversal ''G''opher ''H''ierarchy ''E''xcavation ''A''nd ''D''isplay) was a tool for obtaining menu information from specific Gopher servers. While the name of the search engine "
Archie Search Engine Archie is a tool for indexing File Transfer Protocol, FTP archives, allowing users to more easily identify specific files. It is considered the first Internet Search engine (computing), search engine. The original implementation was written in 1 ...
" was not a reference to the Archie comic book series, "
VeronicaVeronica, Veronika, etc., may refer to: People * Veronica (name) Veronica (also spelled Weronika, Veronika, Verónica or Verônica) is a female given name, the Latin transliteration of the Greek language, Greek name Berenice, Βερενίκη, wh ...
" and " Jughead" are characters in the series, thus referencing their predecessor. In the summer of 1993, no search engine existed for the web, though numerous specialized catalogues were maintained by hand.
Oscar Nierstrasz Oscar Marius Nierstrasz (born ) is a Professor at the Computer Science Institute (IAM) at the University of Berne, and a specialist in software engineering and programming languages. He is active in the field of * programming languages and mecha ...
at the
University of Geneva The University of Geneva ( French: ''Université de Genève'') is a public research university located in Geneva , neighboring_municipalities= Carouge, Chêne-Bougeries, Cologny, Lancy, Grand-Saconnex, Pregny-Chambésy, Vernier, Switzerland ...

University of Geneva
wrote a series of
Perl Perl is a family of two high-level High-level and low-level, as technical terms, are used to classify, describe and point to specific Objective (goal), goals of a systematic operation; and are applied in a wide range of contexts, such as, for ...
scripts that periodically mirrored these pages and rewrote them into a standard format. This formed the basis for
W3CatalogW3 Catalog was an early web search engine, first released on September 2, 1993 by developer Oscar Nierstrasz at the University of Geneva. The engine was initially given the name ''jughead'', but then later renamed. Unlike later search engines, li ...
, the web's first primitive search engine, released on September 2, 1993. In June 1993, Matthew Gray, then at
MIT Massachusetts Institute of Technology (MIT) is a private land-grant research university A research university is a university A university ( la, universitas, 'a whole') is an educational institution, institution of higher education, hi ...
, produced what was probably the first
web robot An Internet bot, web Web most often refers to: * Spider web A spider web, spiderweb, spider's web, or cobweb (from the archaic word '' coppe'', meaning "spider") is a structure created by a spider Spiders ( order Araneae) are air-breath ...
, the
Perl Perl is a family of two high-level High-level and low-level, as technical terms, are used to classify, describe and point to specific Objective (goal), goals of a systematic operation; and are applied in a wide range of contexts, such as, for ...
-based World Wide Web Wanderer, and used it to generate an index called "Wandex". The purpose of the Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second search engine
Aliweb ALIWEB (Archie Like Indexing for the Web) is considered the first Web search engine A search engine is a software system that is designed to carry out Web search query, web searches (Internet searches), which means to search the World Wide Web ...
appeared in November 1993. Aliweb did not use a
web robot An Internet bot, web Web most often refers to: * Spider web A spider web, spiderweb, spider's web, or cobweb (from the archaic word '' coppe'', meaning "spider") is a structure created by a spider Spiders ( order Araneae) are air-breath ...
, but instead depended on being notified by of the existence at each site of an index file in a particular format.
JumpStation JumpStation was the first WWW search engine that behaved, and appeared to the user, the way current web search engines do. It started indexing on 12 December 1993 and was announced on the Mosaic A mosaic is a pattern or image made of small regu ...
(created in December 1993 by Jonathon Fletcher) used a
web robot An Internet bot, web Web most often refers to: * Spider web A spider web, spiderweb, spider's web, or cobweb (from the archaic word '' coppe'', meaning "spider") is a structure created by a spider Spiders ( order Araneae) are air-breath ...
to find web pages and to build its index, and used a
web form A webform, web form or HTML form on a web page allows a user to enter data that is sent to a server for processing. Forms can resemble paper Paper is a thin sheet material produced by mechanically and/or chemically processing cellulose fibres ...
as the interface to its query program. It was thus the first
WWW upright=1.35, A global map of the web index for countries in 2014 The World Wide Web (WWW), commonly known as the Web, is an information system An information system (IS) is a formal, sociotechnical, organizational system designed to collec ...
resource-discovery tool to combine the three essential features of a web search engine (crawling, indexing, and searching) as described below. Because of the limited resources available on the platform it ran on, its indexing and hence searching were limited to the titles and headings found in the web pages the crawler encountered. One of the first "all text" crawler-based search engines was
WebCrawler WebCrawler is a search engine A search engine is a software system that is designed to carry out Web search query, web searches (Internet searches), which means to search the World Wide Web in a systematic way for particular information spec ...
, which came out in 1994. Unlike its predecessors, it allowed users to search for any word in any webpage, which has become the standard for all major search engines since. It was also the search engine that was widely known by the public. Also in 1994,
Lycos Lycos, Inc., is a web search engine A search engine is a software system that is designed to carry out web searches (Internet searches), which means to search the World Wide Web upright=1.35, A global map of the web index for countr ...

Lycos
(which started at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private university, private research university based in Pittsburgh, Pennsylvania. Founded in 1900, the university is a merger of the Carnegie Institute of Technology and the Mellon Institute of Industrial Re ...
) was launched and became a major commercial endeavor. The first popular search engine on the Web was
Yahoo! Search Yahoo! Search is a rebadged version of the Microsoft Bing search engine owned by Yahoo! Yahoo! (, styled as yahoo''!'') is an American web services provider. It is headquartered in Sunnyvale, California and owned by Verizon Media, which a ...
. The first product from
Yahoo! Yahoo (, styled as yahoo''!'') is an American web services The term Web service (WS) is either: * a service offered by an electronic device to another electronic device, communicating with each other via the World Wide Web upright=1.35, ...
, founded by
Jerry Yang Jerry Chih-Yuan Yang (born November 6, 1968) is a Taiwanese-American billionaire computer programmer, internet entrepreneur, and venture capitalist. He is the co-founder and former CEO of Yahoo! Inc. Early life Yang was born with the name ...

Jerry Yang
and
David Filo David Robert Filo (born April 20, 1966) is an American billionaire businessman and the co-founder of Yahoo! Yahoo! (, styled as yahoo''!'') is an American web services provider. It is headquartered in Sunnyvale, California and owned by Veri ...

David Filo
in January 1994, was a
Web directory A web directory or link directory is an online list or catalog of website A website (also written as web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server ...
called
Yahoo! Directory The Yahoo! Directory was a web directory which at one time rivaled DMOZ in size. The directory was Yahoo! Yahoo! (, styled as yahoo''!'') is an American web services provider. It is headquartered in Sunnyvale, California and owned by Ver ...
. In 1995, a search function was added, allowing users to search Yahoo! Directory! It became one of the most popular ways for people to find web pages of interest, but its search function operated on its web directory, rather than its full-text copies of web pages. Soon after, a number of search engines appeared and vied for popularity. These included
Magellan Ferdinand Magellan ( or ; pt, Fernão de Magalhães, ; es, link=no, Fernando de Magallanes, ; c. 1480 – 27 April 1521) was a Portuguese people, Portuguese explorer who organised the Spanish expedition to the East Indies from 1519 to 1522, re ...
,
Excite Excite (stylized as excite) is a web portal launched in 1995 that provides a variety of content including news and weather, a metasearch engine, a web-based email, instant messaging, Financial quote, stock quotes, and a customizable user homepa ...
,
Infoseek Infoseek (also known as the "big yellow") was an American internet search engine founded in 1994 by Steve Kirsch. Infoseek was originally operated by the Infoseek Corporation, headquartered in Sunnyvale, California. Infoseek was bought by The Wal ...
,
Inktomi Inktomi Corporation was a company that provided software for Internet service providers (ISPs). It was incorporated in Delaware and headquartered in Foster City, California, United States. Customers included Microsoft, HotBot, Amazon.com, eBay, a ...
, Northern Light, and
AltaVista AltaVista was a Web search engine A search engine is a software system that is designed to carry out web searches (Internet searches), which means to search the World Wide Web upright=1.35, A global map of the web index for countrie ...
. Information seekers could also browse the directory instead of doing a keyword-based search. In 1996,
Robin Li Robin Li Yanhong (; born 17 November 1968) is a Chinese software engineer and billionaire internet entrepreneur. He is the co-founder of the search engine Baidu Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) ...
developed the
RankDex Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) is a Chinese multinational technology company specializing in Internet-related services and products and artificial intelligence Artificial intelligence (AI ...
site-scoring
algorithm of an algorithm (Euclid's algorithm) for calculating the greatest common divisor (g.c.d.) of two numbers ''a'' and ''b'' in locations named A and B. The algorithm proceeds by successive subtractions in two loops: IF the test B ≥ A yields "yes" ...

algorithm
for search engines results page ranking"About: RankDex"
''rankdex.com''
and received a US patent for the technology. It was the first search engine that used
hyperlink In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and soft ...

hyperlink
s to measure the quality of websites it was indexing, predating the very similar algorithm patent filed by
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...

Google
two years later in 1998.
Larry Page Lawrence Edward Page (born March 26, 1973) is an American computer scientist A computer scientist is a person who has acquired the knowledge of computer science Computer science deals with the theoretical foundations of information, a ...

Larry Page
referenced Li's work in some of his U.S. patents for PageRank. Li later used his Rankdex technology for the
Baidu Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) is a Chinese multinational technology company "Technology", in this context, has come to mean primarily electronics-based technology. This can include, for example ...
search engine, which was founded by Robin Li in China and launched in 2000. In 1996,
Netscape Netscape Communications Corporation (originally Mosaic Communications Corporation) was an American independent computer services company with headquarters in Mountain View, California Mountain View is a city in Santa Clara County, California ...

Netscape
was looking to give a single search engine an exclusive deal as the featured search engine on Netscape's web browser. There was so much interest that instead Netscape struck deals with five of the major search engines: for $5 million a year, each search engine would be in rotation on the Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite.
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...

Google
adopted the idea of selling search terms in 1998, from a small search engine company named goto.com. This move had a significant effect on the search engine business, which went from struggling to one of the most profitable businesses in the Internet. Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, receiving record gains during their
initial public offering An initial public offering (IPO) or stock launch is a public offering in which shares of a company are sold to institutional investors and usually also retail (individual) investors. An IPO is typically underwritten by one or more investment ...
s. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the
dot-com bubble The dot-com bubble, also known as the dot-com boom, the tech bubble, and the Internet bubble, was a stock market bubble Stock (also capital stock) is all of the Share (finance), shares into which ownership of a corporation is divided.Longm ...
, a speculation-driven market boom that peaked in 1990 and ended in 2000.


2000s–present: Post dot-com bubble

Around 2000, Google's search engine rose to prominence. The company achieved better results for many searches with an algorithm called
PageRank PageRank (PR) is an algorithm used by Google Search to rank webpages, web pages in their search engine results. PageRank is a way of measuring the importance of website pages. According to Google: Currently, PageRank is not the only algorithm used ...

PageRank
, as was explained in the paper ''Anatomy of a Search Engine'' written by
Sergey Brin Sergey Mikhaylovich Brin (russian: Серге́й Миха́йлович Брин, tr. ''Sergéj Mixájlovič Brin''; born August 21, 1973) is an American computer scientist and Internet entrepreneur. Together with Larry Page Lawrence ...
and
Larry Page Lawrence Edward Page (born March 26, 1973) is an American computer scientist A computer scientist is a person who has acquired the knowledge of computer science Computer science deals with the theoretical foundations of information, a ...

Larry Page
, the later founders of Google. This
iterative algorithm In computational mathematics Computational mathematics involves mathematics, mathematical research in mathematics as well as in areas of science where computation, computing plays a central and essential role, and emphasizes algorithms, numerical ...
ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Larry Page's patent for PageRank cites
Robin Li Robin Li Yanhong (; born 17 November 1968) is a Chinese software engineer and billionaire internet entrepreneur. He is the co-founder of the search engine Baidu Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) ...
's earlier
RankDex Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) is a Chinese multinational technology company specializing in Internet-related services and products and artificial intelligence Artificial intelligence (AI ...
patent as an influence. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a
web portal A web portal is a specially designed website that brings information from diverse sources, like emails, Internet forum, online forums and Web search engine, search engines, together in a uniform way. Usually, each information source gets its dedica ...
. In fact, the Google search engine became so popular that spoof engines emerged such as Mystery Seeker. By 2000,
Yahoo! Yahoo (, styled as yahoo''!'') is an American web services The term Web service (WS) is either: * a service offered by an electronic device to another electronic device, communicating with each other via the World Wide Web upright=1.35, ...
was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and
Overture Overture (from French language, French ''ouverture'', "opening") in music was originally the instrumental introduction to a ballet, opera, or oratorio in the 17th century. During the early Romantic era, composers such as Ludwig van Beethoven, Beet ...

Overture
(which owned
AlltheWeb AlltheWeb (sometimes referred to as FAST or FAST Search) was an Internet Search engine (computing), search engine that made its debut in mid-1999 and was closed in 2011. It grew out of ''FTP Search'', Tor Egge's doctorate thesis at the Norwegian Un ...
and AltaVista) in 2003. Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports, and sells Software, computer software, consumer electroni ...

Microsoft
first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the site began to display listings from Looksmart, blended with results from Inktomi. For a short time in 1999, MSN Search used results from AltaVista instead. In 2004,
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports, and sells Software, computer software, consumer electroni ...

Microsoft
began a transition to its own search technology, powered by its own
web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot An Internet bot, web Web most often refers to: * Spider web A spider web, spiderweb, spider's web, or cobweb (from the archaic word ...
(called msnbot). Microsoft's rebranded search engine,
Bing Bing most often refers to: * Bing Crosby Harry Lillis "Bing" Crosby Jr. (May 3, 1903 – October 14, 1977) was an American singer, comedian and actor. The first multimedia star, Crosby was one of the most popular and influential musical a ...
, was launched on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized a deal in which
Yahoo! Search Yahoo! Search is a rebadged version of the Microsoft Bing search engine owned by Yahoo! Yahoo! (, styled as yahoo''!'') is an American web services provider. It is headquartered in Sunnyvale, California and owned by Verizon Media, which a ...
would be powered by Microsoft Bing technology. As of 2019, active search engine crawlers include those of
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...
,
Petal upright=1.4, Diagram showing the parts of a mature flower. In this example the perianth is separated into a calyx (sepals) and corolla (petals) Petals are modified leaves A leaf (plural leaves) is the principal lateral appendage of the ...
,
Sogou Sogou, Inc. () is a Chinese technology company that offers a search engine A search engine is a software system that is designed to carry out Web search query, web searches. They search the World Wide Web in a systematic way for particula ...
,
Baidu Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) is a Chinese multinational technology company "Technology", in this context, has come to mean primarily electronics-based technology. This can include, for example ...
,
Bing Bing most often refers to: * Bing Crosby Harry Lillis "Bing" Crosby Jr. (May 3, 1903 – October 14, 1977) was an American singer, comedian and actor. The first multimedia star, Crosby was one of the most popular and influential musical a ...
, Gigablast, Mojeek, DuckDuckGo and Yandex.


Approach

A search engine maintains the following processes in near real time: # Web crawling # Index (search engine), Indexing # Web search query, Searching Web search engines get their information by Web crawler, web crawling from site to site. The "spider" checks for the standard filename ''robots.txt'', addressed to it. The robots.txt file contains directives for search spiders, telling it which pages to crawl and which pages not to crawl. After checking for robots.txt and either finding it or not, the spider sends certain information back to be Search engine indexing, indexed depending on many factors, such as the titles, page content, JavaScript, Cascading Style Sheets (CSS), headings, or its metadata in HTML meta tags. After a certain number of pages crawled, amount of data indexed, or time spent on the website, the spider stops crawling and moves on. "[N]o web crawler may actually crawl the entire reachable web. Due to infinite websites, spider traps, spam, and other exigencies of the real web, crawlers instead apply a crawl policy to determine when the crawling of a site should be deemed sufficient. Some websites are crawled exhaustively, while others are crawled only partially". Indexing means associating words and other definable tokens found on web pages to their domain names and HTML-based fields. The associations are made in a public database, made available for web search queries. A query from a user can be a single word, multiple words or a sentence. The index helps find information relating to the query as quickly as possible. Some of the techniques for indexing, and search engine cache, caching are trade secrets, whereas web crawling is a straightforward process of visiting all sites on a systematic basis. Between visits by the ''spider'', the cached version of the page (some or all the content needed to render it) stored in the search engine working memory is quickly sent to an inquirer. If a visit is overdue, the search engine can just act as a web proxy instead. In this case, the page may differ from the search terms indexed. The cached page holds the appearance of the version whose words were previously indexed, so a cached version of a page can be useful to the website when the actual page has been lost, but this problem is also considered a mild form of linkrot. Typically when a user enters a web search query, query into a search engine it is a few Keyword (Internet search), keywords. The inverted index, index already has the names of the sites containing the keywords, and these are instantly obtained from the index. The real processing load is in generating the web pages that are the search results list: Every page in the entire list must be weighting, weighted according to information in the indexes. Then the top search result item requires the lookup, reconstruction, and markup of the ''snippets'' showing the context of the keywords matched. These are only part of the processing each search results web page requires, and further pages (next to the top) require more of this post-processing. Beyond simple keyword lookups, search engines offer their own GUI- or command-driven operators and search parameters to refine the search results. These provide the necessary controls for the user engaged in the feedback loop users create by ''filtering'' and ''weighting'' while refining the search results, given the initial pages of the first search results. For example, from 2007 the Google.com search engine has allowed one to ''filter'' by date by clicking "Show search tools" in the leftmost column of the initial search results page, and then selecting the desired date range. It's also possible to ''weight'' by date because each page has a modification time. Most search engines support the use of the boolean operator (computer programming), boolean operators AND, OR and NOT to help end users refine the web search query, search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called Proximity search (text), proximity search, which allows users to define the distance between keywords. There is also Concept search, concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. The usefulness of a search engine depends on the relevance (information retrieval), relevance of the ''result set'' it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank order, rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an "inverted index" by analyzing texts it locates. This first form relies much more heavily on the computer itself to do the bulk of the work. Most Web search engines are commercial ventures supported by advertising revenue and thus some of them allow advertisers to paid inclusion, have their listings ranked higher in search results for a fee. Search engines that do not accept money for their search results make money by running contextual advertising, search related ads alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.


Local search

Local search (Internet), Local search is the process that optimizes the efforts of local businesses. They focus on change to make sure all searches are consistent. It's important because many people determine where they plan to go and what to buy based on their searches.


Market share

,
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...
is by far the world's most used search engine, with a market share of 91.66%, and the world's other most used search engines were Microsoft Bing, Bing,
Baidu Baidu, Inc. (, meaning "a hundred times" or "a hundred degrees", anglicized ) is a Chinese multinational technology company "Technology", in this context, has come to mean primarily electronics-based technology. This can include, for example ...
, Yahoo! Search, Yahoo!, Yandex Search, Yandex, and DuckDuckGo.


Russia and East Asia

In Russia, Yandex has a market share of 61.9%, compared to Google's 28.3%. In China, Baidu is the most popular search engine. South Korea's homegrown search portal, Naver, is used for 70% of online searches in the country. Yahoo! Japan and Yahoo! Search, Yahoo! Taiwan are the most popular avenues for Internet searches in Japan and Taiwan, respectively. China is one of few countries where Google is not in the top three web search engines for market share. Google was previously a top search engine in China, but withdrew after a disagreement with the government over censorship, and a cyberattack.


Europe

Most countries' markets in the European Union are dominated by Google, except for the Czech Republic, where Seznam is a strong competitor. The search engine Qwant is based in Paris, France, where it attracts most of its 50 million monthly registered users from.


Search engine bias

Although search engines are programmed to rank websites based on some combination of their popularity and relevancy, empirical studies indicate various political, economic, and social biases in the information they provide and the underlying assumptions about the technology. These biases can be a direct result of economic and commercial processes (e.g., companies that advertise with a search engine can become also more popular in its organic search results), and political processes (e.g., the removal of search results to comply with local laws). For example, Google will not surface certain neo-Nazism, neo-Nazi websites in France and Germany, where Holocaust denial is illegal. Biases can also be a result of social processes, as search engine algorithms are frequently designed to exclude non-normative viewpoints in favor of more "popular" results. Indexing algorithms of major search engines skew towards coverage of U.S.-based sites, rather than websites from non-U.S. countries. Google Bombing is one example of an attempt to manipulate search results for political, social or commercial reasons. Several scholars have studied the cultural changes triggered by search engines, and the representation of certain controversial topics in their results, such as terrorism in Ireland, climate change denial, and Conspiracy theory, conspiracy theories.


Customized results and filter bubbles

Many search engines such as Google and Bing provide customized results based on the user's activity history. This leads to an effect that has been called a filter bubble. The term describes a phenomenon in which websites use
algorithm of an algorithm (Euclid's algorithm) for calculating the greatest common divisor (g.c.d.) of two numbers ''a'' and ''b'' in locations named A and B. The algorithm proceeds by successive subtractions in two loops: IF the test B ≥ A yields "yes" ...

algorithm
s to selectively guess what information a user would like to see, based on information about the user (such as location, past click behaviour and search history). As a result, websites tend to show only information that agrees with the user's past viewpoint. This puts the user in a state of intellectual isolation without contrary information. Prime examples are Google's personalized search results and Facebook's personalized news stream. According to Eli Pariser, who coined the term, users get less exposure to conflicting viewpoints and are isolated intellectually in their own informational bubble. Pariser related an example in which one user searched Google for "BP" and got investment news about British Petroleum while another searcher got information about the Deepwater Horizon oil spill and that the two search results pages were "strikingly different". The bubble effect may have negative implications for civic discourse, according to Pariser. Since this problem has been identified, competing search engines have emerged that seek to avoid this problem by not tracking or "bubbling" users, such as DuckDuckGo. Other scholars do not share Pariser's view, finding the evidence in support of his thesis unconvincing.


Religious search engines

The global growth of the Internet and electronic media in the Arab and Muslim World during the last decade has encouraged Islamic adherents in Middle East, the Middle East and Indian subcontinent, Asian sub-continent, to attempt their own search engines, their own filtered search portals that would enable users to perform Content-control software, safe searches. More than usual ''safe search'' filters, these Islamic web portals categorizing websites into being either "halal" or "haram", based on interpretation of Shariah, the "Law of Islam". ImHalal came online in September 2011. Halalgoogling came online in July 2013. These use haram filters on the collections from
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...

Google
and
Bing Bing most often refers to: * Bing Crosby Harry Lillis "Bing" Crosby Jr. (May 3, 1903 – October 14, 1977) was an American singer, comedian and actor. The first multimedia star, Crosby was one of the most popular and influential musical a ...
(and others). While lack of investment and slow pace in technologies in the Muslim World has hindered progress and thwarted success of an Islamic search engine, targeting as the main consumers Islamic adherents, projects like Muxlim, a Muslim lifestyle site, did receive millions of dollars from investors like Rite Internet Ventures, and it also faltered. Other religion-oriented search engines are Jewogle, the Jewish version of Google, and SeekFind.org, which is Christian. SeekFind filters sites that attack or degrade their faith.


Search engine submission

Web search engine submission is a process in which a webmaster submits a website directly to a search engine. While search engine submission is sometimes presented as a way to promote a website, it generally is not necessary because the major search engines use web crawlers that will eventually find most web sites on the Internet without assistance. They can either submit one web page at a time, or they can submit the entire site using a sitemap, but it is normally only necessary to submit the home page of a web site as search engines are able to crawl a well designed website. There are two remaining reasons to submit a web site or web page to a search engine: to add an entirely new web site without waiting for a search engine to discover it, and to have a web site's record updated after a substantial redesign. Some search engine submission software not only submits websites to multiple search engines, but also adds links to websites from their own pages. This could appear helpful in increasing a website's ranking (information retrieval), ranking, because external links are one of the most important factors determining a website's ranking. However, John Mueller of
Google Google LLC is an American Multinational corporation, multinational technology company that specializes in Internet-related services and products, which include online advertising, online advertising technologies, a search engine, cloud comp ...

Google
has stated that this "can lead to a tremendous number of unnatural links for your site" with a negative impact on site ranking.


See also


References


Further reading

* * Bing Liu (2007),
Web Data Mining: Exploring Hyperlinks, Contents and Usage Data
'' Springer, * Judit Bar-Ilan, Bar-Ilan, J. (2004). The use of Web search engines in information science research. ARIST, 38, 231–288. * * * * * *


External links

* {{Authority control Internet search engines, History of the Internet Internet terminology Computer-related introductions in 1993 Canadian inventions