HOME

TheInfoList



OR:

Apache Lucene is a
free and open-source Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
software library In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
, originally written in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
by Doug Cutting. It is supported by the
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
and is released under the Apache Software License. Lucene is widely used as a standard foundation for production search applications. Lucene has been ported to other programming languages including
Object Pascal Object Pascal is an extension to the programming language Pascal (programming language), Pascal that provides object-oriented programming (OOP) features such as Class (computer programming), classes and Method (computer programming), methods. T ...
,
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
, C#, C++, Python,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
and PHP.


History

Doug Cutting originally wrote Lucene in 1999. Lucene was his fifth search engine. He had previously written two while at Xerox PARC, one at
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
, and a fourth at Excite. It was initially available for download from its home at the
SourceForge SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
web site. It joined the Apache Software Foundation's
Jakarta Jakarta (; , Betawi language, Betawi: ''Jakartè''), officially the Special Capital Region of Jakarta (; ''DKI Jakarta'') and formerly known as Batavia, Dutch East Indies, Batavia until 1949, is the capital and largest city of Indonesia and ...
family of open-source Java products in September 2001 and became its own top-level Apache project in February 2005. The name Lucene is Doug Cutting's wife's middle name and her maternal grandmother's first name. Lucene formerly included a number of sub-projects, such as Lucene.NET, Mahout, Tika and Nutch. These three are now independent top-level projects. In March 2010, the Apache Solr search server joined as a Lucene sub-project, merging the developer communities. Version 4.0 was released on October 12, 2012. In March 2021, Lucene changed its logo, and Apache Solr became a top level Apache project again, independent from Lucene.


Features and common use

While suitable for any application that requires full text indexing and searching capability, Lucene is recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene includes a feature to perform a fuzzy search based on
edit distance In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two String (computing), strings (e.g., words) are to one another, that is measured by counting the minimum number of opera ...
. Lucene has also been used to implement recommendation systems. For example, Lucene's 'MoreLikeThis' Class can generate recommendations for similar documents. In a comparison of the term vector-based similarity approach of 'MoreLikeThis' with citation-based document similarity measures, such as co-citation and co-citation proximity analysis, Lucene's approach excelled at recommending documents with very similar structural characteristics and more narrow relatedness.M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, https://www.gipp.com/wp-content/papercite-data/pdf/schwarzer2016.pdf "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200. In contrast, citation-based document similarity measures tended to be more suitable for recommending more broadly related documents, meaning citation-based approaches may be more suitable for generating serendipitous recommendations, as long as documents to be recommended contain in-text citations.


Lucene-based projects

Lucene itself is just an indexing and search library and does not contain crawling and HTML
parsing Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
functionality. However, several projects extend Lucene's capability: * Apache Nutch – provides web crawling and HTML parsing * Apache Solr – an enterprise search server * CrateDB – open source, distributed SQL database built on Lucene * DocFetcher – a multiplatform desktop search application *
Elasticsearch Elasticsearch is a Search engine (computing), search engine based on Apache Lucene, a free and open-source search engine. It provides a distributed, Multitenancy, multitenant-capable full-text search engine with an HTTP web interface and schema ...
– an enterprise search server released in 2010 * Kinosearch – a search engine written in
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
and C and a loose
port A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Hamburg, Manch ...
of Lucene. The Socialtext wiki software uses this search engine, and so does the MojoMojo wiki. It is also used by the Human Metabolome Database (HMDB) and the Toxin and Toxin-Target Database (T3DB). *
MongoDB MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB uses JSON-like documents with optional database schema, schemas. Released in February 2009 by 10gen (now MongoDB ...
Atlas Search – a cloud-native enterprise search application based on MongoDB and Apache Lucene * OpenSearch – an open source enterprise search server based on a fork of Elasticsearch 7 * Swiftype – an enterprise search startup based on Lucene


See also

* Enterprise search * Information extraction *
Information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
* Text mining


References


Bibliography

* *


External links

* {{Authority control Lucene Free search engine software Java (programming language) libraries C Sharp libraries Cross-platform software Software using the Apache license Search engine software Pascal (programming language) software 1999 software