HOME

TheInfoList



OR:

Desktop search tools search within a user's own
computer files A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and transf ...
as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video. A variety of desktop search programs are now available; see this list for examples. Most desktop search programs are standalone applications. Desktop search products are software alternatives to the search software included in the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
, helping users sift through desktop files, emails, attachments, and more. Desktop search emerged as a concern for large firms for two main reasons: untapped productivity and security. According to analyst firm Gartner, up to 80% of some companies' data is locked up inside
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
— the information stored on a user's PC, the directories (folders) and files they've created on a
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, documents stored in repositories such as corporate intranets and a multitude of other locations. Moreover, many companies have structured or unstructured information stored in older
file formats A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
to which they don't have ready access. The sector attracted considerable attention in the late 2004 to early 2005 period from the struggle between Microsoft and Google. According to market analysts, both companies were attempting to leverage their monopolies (of
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s and search engines, respectively) to strengthen their dominance. Due to
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
's complaint that users of Windows Vista cannot choose any competitor's desktop search program over the built-in one, an agreement was reached between
US Justice Department The United States Department of Justice (DOJ), also known as the Justice Department, is a federal executive department of the United States government tasked with the enforcement of federal law and administration of justice in the United State ...
and
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
that
Windows Vista Service Pack 1 Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
would enable users to choose between the built-in and other desktop search programs, and select which one is to be the default. As of September 2011, Google ended life for
Google Desktop Google Desktop was a computer program with desktop search capabilities, created by Google for Linux, Apple Mac OS X, and Microsoft Windows systems. It allowed text searches of a user's email messages, computer files, music, photos, chats, Web pag ...
.


Technologies

Most desktop search engines build and maintain an index database to improve performance when searching large amounts of
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
. Indexing usually takes place when the computer is idle and most search applications can be set to suspend indexing if a portable computer is running on batteries, in order to save power. There are notable exceptions, however: Voidtools' Everything Search Engine, which performs searches over only file names, not contents, is able to build its index from scratch in just a few seconds. Another exception is Vegnos Desktop Search Engine, which performs searches over filenames and files' contents without building any indices. An index may also not be up-to-date, when a query is performed. In this case, results returned will not be accurate (that is, a hit may be shown when it is no longer there, and a file may not be shown, when in fact it is a hit). Some products have sought to remedy this disadvantage by building a real-time indexing function into the software. There are disadvantages to not indexing. Namely, the time to complete a query can be significant, and the issued query can also be resource-intensive. Desktop search tools typically collect three types of information about files: * file and folder names * metadata, such as titles, authors, comments in file types such as
MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
, PDF and JPEG * file content, for the types of documents supported by the tool Long-term goals for desktop search include the ability to search the contents of image files, sound files and video by context.


Platforms & their histories


Windows

Indexing Service Indexing Service (originally called Index Server) was a Windows service that maintained an index of most of the files on a computer to improve searching performance on PCs and corporate computer networks. It updated indexes without user interven ...
a "a base service that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching" was originally released in August 1996, it was built in order to speed up manually searching for files on Personal Desktops and Corporate Computer Network. Indexing service helped by using Microsoft web servers to index files on the desired hard drives. Indexing was done by file format. By using terms that users provided, a search was conducted that matched terms to the data within the file formats. The largest issue that Indexing service faced was the fact that every time a file was added, it had to be indexed. This coupled with the fact that the indexing cached the entire index in RAM, made the hardware a huge limitation. This made indexing large amounts of files require extremely powerful hardware and very long wait times. In 2003,
Windows Desktop Search Windows Search (also known as Instant Search) is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Wind ...
(WDS) replaced Microsoft Indexing Service. Instead of only matching terms to the details of the file format and file names, WDS brings in content indexing to all Microsoft files and text-based formats such as e-mail and text files. This means, that WDS looked into the files and indexed the content. Thus, when a user searched a term, WDS no longer matched just information such as file format types and file names, but terms, and values stored within those files. WDS also brought "Instant searching" meaning the user could type a character and the query would instantly start searching and updating the query as the user typed in more characters. Windows Search apparently used up a lot of processing power, as Windows Desktop Search would only run if it was directly queried or while the PC was idle. Even only running while directly queried or while the computer was idled, indexing the entire hard drive still took hours. The index would be around 10% of the size of all the files that it indexed, e.g. if the indexed files amounted to around 100GB, the index size would be 10GB. With the release of
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
came
Windows Search Windows Search (also known as Instant Search) is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Win ...
3.1. Unlike its predecessors WDS and Windows Search 3.0, 3.1 could search through both indexed and non indexed locations seamlessly. Also, the
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
and CPU requirements were greatly reduced, cutting back indexing times immensely. Windows Search 4.0 is currently running on all PCs with
Windows 7 Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was released to manufacturing on July 22, 2009, and became generally available on October 22, 2009. It is the successor to Windows Vista, released nearly ...
and up.


Mac OS

In 1994 the
AppleSearch AppleSearch was a client/server search engine from Apple Computer, first released for the classic Mac OS in 1994. AppleSearch was a client/server application, although the vast majority of the logic was located in the server. The server portion ...
search engine was introduced, allowing users to fully search all documents within their Macintosh computer, including file format types, meta-data on those files, and content within the files. AppleSearch was a client/server application, and as such required a server separate from the main device in order to function. The biggest issue with AppleSearch were its large resource requirements: "AppleSearch requires at least a 68040 processor and 5MB of RAM." At the time, a Macintosh computer with these specifications was priced at approximately $1400; equivalent to $2050 in 2015. On top of this, the software itself cost an additional $1400 for a single license. In 1997, Sherlock was released alongside Mac OS 8.5. Sherlock (named after the famous fictional detective Sherlock Holmes) was integrated into Mac OS's file browser – Finder. Sherlock extended the desktop search function to the World Wide Web, allowing users to search both locally and externally. Adding additional functions—such as internet access—to Sherlock was relatively simple, as this was done through plugins written as plain text files. Sherlock was included in every release of Mac OS from
Mac OS 8 Mac OS 8 is an operating system that was released by Apple Computer on July 26, 1997. It includes the largest overhaul of the classic Mac OS experience since the release of System 7, approximately six years before. It places a greater emphasis o ...
, before being deprecated and replaced by
Spotlight Spotlight or spot light may refer to: Lighting * Spot lights, automotive auxiliary lamps * Spotlight (theatre lighting) * Spotlight, a searchlight * Stage lighting instrument, stage lighting instruments, of several types Art, entertainment, an ...
and
Dashboard For business applications, see Dashboard (business). A dashboard (also called dash, instrument panel (IP), or fascia) is a control panel set within the central console of a vehicle or small aircraft. Usually located directly ahead of the drive ...
in Mac OS X 10.4 Tiger. It was officially removed in Mac OS X 10.5 Leopard
Spotlight Spotlight or spot light may refer to: Lighting * Spot lights, automotive auxiliary lamps * Spotlight (theatre lighting) * Spotlight, a searchlight * Stage lighting instrument, stage lighting instruments, of several types Art, entertainment, an ...
was released in 2005 as part of Mac OS X 10.4 Tiger. It is a Selection-based search tool, which means the user invokes a query using only the mouse. Spotlight allows the user to search the Internet for more information about any keyword or phrase contained within a document or webpage, and uses a built-in calculator and Oxford American Dictionary to offer quick access to small calculations and word definitions. While Spotlight initially has a long startup time, this decreases as the hard disk is indexed. As files are added by the user, the index is constantly updated in the background using minimal CPU & RAM resources.


Linux

There are a wide range of desktop search options for Linux users, depending upon the skill level of the user, their preference to use desktop tools which tightly integrate into their desktop environment, command-shell functionality (often with advanced scripting options), or browser-based users interfaces to locally running software. In addition, many users create their own indexing from a variety of indexing packages (e.g. one which does extraction and indexing of PDF/DOC/DOCX/ ODT documents well, another search engine which works w/ vcard, LDAP, and other directory/contact databases, as well as the conventional find and locate commands.


Ubuntu

Ubuntu Linux Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. Al ...
didn't have desktop search until release Feisty Fawn 7.04. Using Tracker desktop search, the desktop search feature was very similar to Mac OS's AppleSearch and Sherlock. It not only featured the basic features of file format sorting and meta-data matching, but support for searching through emails and instant messages was added. In 2014
Recoll Recoll is a desktop search tool that provides full text search (from single-word to arbitrarily complex boolean searches) in a GUI with few mandatory external dependencies. It runs under many Unix-like Operating system, operating systems, and is ...
was added to Linux distributions, working with other search programs such as Tracker and
Beagle The beagle is a breed of small scent hound, similar in appearance to the much larger foxhound. The beagle was developed primarily for hunting hare, known as beagling. Possessing a great sense of smell and superior tracking instincts, th ...
to provide efficient full text search. This greatly increased the types of queries and file types that Linux desktop searches could handle. A major advantage of Recoll is that it allows for greater customization of what is indexed; Recoll will index the entire hard disk by default, but can be made to index only selected directories, omitting directories that will never need to be searched.


openSUSE openSUSE () is a free and open source RPM-based Linux distribution developed by the openSUSE project. The initial release of the community project was a beta version of SUSE Linux 10.0. Additionally the project creates a variety of tools, s ...

Starting with KDE4, the
NEPOMUK Nepomuk (; german: Pomuk) is a town in Plzeň-South District in the Plzeň Region of the Czech Republic. It has about 3,700 inhabitants. It is known as the birthplace of Saint John of Nepomuk, who was born here around 1340 and whose statue can b ...
was introduced. It provided the ability to index a wide range of desktop content, email, and use semantic web technologies (e.g. RDF) to annotate the database. The introduction faced a few glitches, much of which seemed to be based on the
triplestore A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred". Much like a relat ...
. Performance improved (at least for queries) by switching the backend to a stripped-down version of the Virtuoso Open Source Edition, however indexing remained a common user complaint. Based on user feedback, the Nepomuk indexing and search has been replaced with the Baloo framework based on
Xapian Xapian is a free and open-source probabilistic information retrieval library, released under the GNU General Public License (GPL). It is a full-text search engine library for programmers. It is written in C++, with bindings to allow use from ...
.


See also

* List of desktop search engines


References

{{DEFAULTSORT:Desktop Search Information retrieval genres