Enterprise search is software technology for searching data sources internal to a company, typically
intranet
An intranet is a computer network for sharing information, easier communication, collaboration tools, operational systems, and other computing services within an organization, usually to the exclusion of access by outsiders. The term is used in ...
and
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
content. The search is generally offered only to users internal to the company.
Enterprise search can be contrasted with
web search, which applies search technology to documents on the open web, and
desktop search, which applies search technology to the content on a single computer.
Enterprise search systems index data and documents from a variety of sources such as:
file systems,
intranets,
document management systems,
e-mail
Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
, and
databases
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
. Many enterprise search systems integrate structured and
unstructured data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically plain text, text-heavy, but may contain data such ...
in their collections. Enterprise search systems also use access controls to enforce a security policy on their users.
Enterprise search can be seen as a type of
vertical search of an enterprise.
Components of an enterprise search system
In an enterprise search system, content goes through various phases from source repository to search results:
Content awareness
Content awareness (or "content collection") is usually either a push or pull model. In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s. This model is used when real-time indexing is important. In the pull model, the software gathers content from sources using a connector such as a
web crawler
Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
or a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content.
Content processing and analysis
Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters. It is also often necessary to normalize content in various ways to improve
recall or
precision. These may include
stemming,
lemmatization,
synonym
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
expansion,
entity extraction,
part of speech
In grammar, a part of speech or part-of-speech ( abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ...
tagging.
As part of processing and analysis,
tokenization is applied to split the content into
tokens which is the basic matching unit. It is also common to normalize tokens to lower case to provide case-insensitive search, as well as to normalize accents to provide better recall.
Indexing
The resulting text is stored in an
index
Index (: indexes or indices) may refer to:
Arts, entertainment, and media Fictional entities
* Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index''
* The Index, an item on the Halo Array in the ...
, which is optimized for quick lookups without storing the full text of the document. The index may contain the dictionary of all unique words in the corpus as well as information about ranking and
term frequency.
Query processing
Using a web page, the user issues a
query to the system. The query consists of any terms the user enters as well as navigational actions such as
faceting and paging information.
Matching
The processed query is then compared to the stored index, and the search system returns results (or "hits") referencing source documents that match. Some systems are able to present the document as it was indexed.
See also
*
Collaborative search engine
*
Data defined storage
*
Enterprise bookmarking
*
Enterprise information access
*
Faceted search
*
Information extraction
*
Knowledge management
Knowledge management (KM) is the set of procedures for producing, disseminating, utilizing, and overseeing an organization's knowledge and data. It alludes to a multidisciplinary strategy that maximizes knowledge utilization to accomplish organ ...
*
List of search engines
Search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites have a search facility for online databases.
By content/topic
General ...
*
Text mining
*
Vertical search
References
{{DEFAULTSORT:Enterprise Search
Information retrieval genres
Enterprise search