YaCy
   HOME

TheInfoList



OR:

''YaCy'' (pronounced “ya see”) is a free distributed search engine built on the principles of
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
(P2P) networks, created by Michael Christen in 2003. The engine is written in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and distributed on several hundred computers, , so-called YaCy-peers. Each YaCy-peer independently crawls through the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
, analyzes and indexes found web pages, and stores indexing results in a common database (so-called index) which is shared with other YaCy-peers using principles of
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
. This decentralized approach ensures privacy and eliminates the need for a central server. Compared to semi-distributed search engines, the YaCy network has a distributed architecture. All YaCy-peers are equal and no central
server Server may refer to: Computing *Server (computing), a computer program or a device that provides requested information for other programs or devices, called clients. Role * Waiting staff, those who work at a restaurant or a bar attending custome ...
exists. It can be run either in a crawling mode or as a local proxy server, indexing web pages visited by the person running YaCy on their computer. Several mechanisms are provided to protect the user's privacy. Search functions are accessed by a locally run web server which provides a search box to enter search terms, and returns search results in a format similar to popular search engines.


System components

YaCy search engine is based on four elements: ;Crawler: A search robot that traverses between
web page A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
s, analyzing their content.: The crawler is responsible for fetching web pages from the internet. Each peer in the YaCy network can crawl and index websites. The crawling process involves: :* Discovery: Finding new
web page A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
s to index by following links. :* Fetching: Downloading the content of web pages. :* Parsing: Extracting relevant information such as text, metadata, and links from the downloaded pages. ;Indexer: It creates a reverse word index (RWI), i.e., each word from the RWI has its list of relevant URLs and ranking information. Words are saved as word hashes. ;Search and administration interface: Made as a web interface provided by a local
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
servlet with a servlet engine. ;Data storage: Used to store the reverse word index database utilizing a
distributed hash table A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
.


Search-engine technology

* YaCy is a complete search appliance with user interface, index, administration, and monitoring. * YaCy harvests web pages with a web crawler. Documents are then parsed, and indexed and the search index is stored locally. If your peer is part of a peer network, then your local search index is also merged into the shared index for that network. ** A search is started, then the local index contributes with a global search index from peers in the YaCy search network. *The YaCy Grid is a second-generation implementation of the YaCy peer-to-peer search. A YaCy Grid installation comprises
microservices In software engineering, a microservice architecture is an architectural pattern that organizes an application into a collection of loosely coupled, fine-grained services that communicate through lightweight protocols. This pattern is characterize ...
that communicate using the Master Connect Program (MCP). *The YaCy Parser is a microservice that can be deployed using Docker. When the Parser Component is started, it searches for and connects to an MCP. By default, the local host is searched for an MCP, but you can configure one yourself.


YaCy platform architecture

YaCy uses a combination of techniques for the networking, administration, and maintenance of indexing the search engine, including blacklisting, moderation, and communication with the community. Here is how YaCy performs these operations: * Community components *# Web forum *# Statistics *# XML API * Maintenance *# Web Server *# Indexing *# Crawler with Balancer *# Peer-to-Peer Server Communication * Content organization *# Blacklisting and Filtering *# Search interface *# Bookmarks *# Monitoring search results


Distribution

YaCy is available in packages for Linux, Windows, and Macintosh, and also as a Docker image; it can also be installed on other operating systems either by manually building it, or using a tarball. YaCy requires Java 11, Temurin 11 is recommended. The
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
package can be installed from a repository available at the subdomain of the project's website, but is not yet maintained in the official Debian package repository.


See also

* Dooble – an open-source web browser with an integrated YaCy Search Engine Tool Widget *
List of search engines Search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites have a search facility for online databases. By content/topic General ...
* Comparison of search engines * Seeks


References


Further reading

YaCy at LinuxReviews


External links

* {{DEFAULTSORT:Yacy Anonymity networks Distributed data storage Free search engine software Free web crawlers Internet properties established in 2003 Internet search engines Java platform software Cross-platform software Software using the GNU General Public License Java (programming language) software Peer-to-peer software