HOME

TheInfoList



OR:

Solr (pronounced "solar") is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
enterprise-search platform, written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. Its major features include
full-text search In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts ...
, hit highlighting,
faceted search Faceted search is a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on faceted classification of the items. It is som ...
, real-time indexing, dynamic clustering, database integration,
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and
fault tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases. Solr runs as a standalone full-text search server. It uses the
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
Java search library at its core for full-text indexing and search, and has
REST Rest or REST may refer to: Relief from activity * Sleep ** Bed rest * Kneeling * Lying (position) * Sitting * Squatting position Structural support * Structural support ** Rest (cue sports) ** Armrest ** Headrest ** Footrest Arts and entert ...
-like
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
/
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
and
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of applications without Java coding, and it has a plugin architecture to support more advanced customization. Apache Solr is developed in an open, collaborative manner by the Apache Solr project at the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
.


History

In 2004, Solr was created by Yonik Seeley at
CNET Networks ''CNET'' (short for "Computer Network") is an American media website that publishes reviews, news, articles, blogs, podcasts, and videos on technology and consumer electronics globally. ''CNET'' originally produced content for radio and televi ...
as an in-house project to add search capability for the company website. In January 2006, CNET Networks decided to openly publish the source code by donating it to the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
. Like any new Apache project, it entered an incubation period which helped solve organizational, legal, and financial issues. In January 2007, Solr graduated from incubation status into a standalone top-level project (TLP) and grew steadily with accumulated features, thereby attracting users, contributors, and committers. Although quite new as a public project, it powered several high-traffic websites. In September 2008, Solr 1.3 was released including distributed search capabilities and performance enhancements among many others. In January 2009, Yonik Seeley along with Grant Ingersoll and Erik Hatcher joined
Lucidworks Lucidworks, a San Francisco, California-based company that specializes in commerce, customer service, and workplace applications. Lucidworks was founded in 2007 under the name Lucid Imagination and launched in 2009. The company was later rena ...
(formerly Lucid Imagination), the first company providing commercial support and training for Apache Solr search technologies. Since then, support offerings around Solr have been abundant. In November 2009, saw the release of Solr 1.4. This version introduced enhancements in indexing, searching and faceting along with many other improvements such as rich document processing (
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
,
Word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
,
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
), Search Results clustering based on Carrot2 and also improved database integration. The release also features many additional plug-ins. In March 2010, the
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
and Solr projects merged. Separate downloads continued, but the products were now jointly developed by a single set of committers. In 2011, the Solr version number scheme was changed in order to match that of Lucene. After Solr 1.4, the next release of Solr was labeled 3.1, in order to keep Solr and Lucene on the same version number. In October 2012, Solr version 4.0 was released, including the new SolrCloud feature. 2013 and 2014 saw a number of Solr releases in the 4.x line, steadily growing the feature set and improving reliability. In February 2015, Solr 5.0 was released, the first release where Solr is packaged as a standalone application, ending official support for deploying Solr as a
war War is an intense armed conflict between states, governments, societies, or paramilitary groups such as mercenaries, insurgents, and militias. It is generally characterized by extreme violence, destruction, and mortality, using regular o ...
. Solr 5.3 featured a built-in pluggable Authentication and Authorization framework. In April 2016, Solr 6.0 was released. Added support for executing Parallel SQL queries across SolrCloud collections. Includes StreamExpression support and a new JDBC Driver for the SQL Interface. In September 2017, Solr 7.0 was released. This release among other things, added support multiple replica types, auto-scaling, and a Math engine. In March 2019, Solr 8.0 was released including many bugfixes and component updates. Solr nodes can now listen and serve HTTP/2 requests. Be aware that by default, internal requests are also sent by using HTTP/2. Furthermore, an admin UI login was added with support for BasicAuth and Kerberos. And plotting math expressions in Apache Zeppelin is now possible. In November 2020, Bloomberg donated th
Solr Operator
to the Lucene/Solr project. The Solr Operator helps deploy and run Solr in
Kubernetes Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
. In February 2021, Solr was established as a separate Apache project (TLP), independent from Lucene. In May 2022, Solr 9.0 was released, as the first release independent from Lucene, requiring Java 11, and with highlights such as KNN "Neural" search, better modularization, more security plugins and more.


Operations

In order to search a document, Apache Solr performs the following operations in sequence: # Indexing: converts the documents into a machine-readable format. # Querying: understanding the terms of a query asked by the user. These terms can be images or keywords, for example. # Mapping: Solr maps the user query to the documents stored in the database to find the appropriate result. # Ranking: as soon as the engine searches the indexed documents, it ranks the outputs by their relevance.


Community

Solr has both individuals and companies who contribute new features and bug fixes.


Integrating Solr

Solr is bundled as the built-in search in many applications such as
content management system A content management system (CMS) is computer software used to manage the creation and modification of digital content (content management).''Managing Enterprise Content: A Unified Content Strategy''. Ann Rockley, Pamela Kostur, Steve Manning. New ...
s and
enterprise content management Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval and distribution. Systems using ECM generally provide a secure ...
systems.
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
distributions from
Cloudera Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers. History Cloudera, Inc. was formed on J ...
,
Hortonworks Hortonworks was a data software company based in Santa Clara, California that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing. Hortonworks software was used to b ...
and
MapR MapR was a business software company headquartered in Santa Clara, California. MapR software provides access to a variety of data sources from a single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a dist ...
all bundle Solr as the search engine for their products marketed for
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
.
DataStax DataStax, Inc. is a real-time data company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache ...
DSE integrates Solr as a search engine with
Cassandra Cassandra or Kassandra (; Ancient Greek: Κασσάνδρα, , also , and sometimes referred to as Alexandra) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecies but never to be believe ...
. Solr is supported as an end point in various data processing frameworks and
Enterprise integration Enterprise integration is a technical field of enterprise architecture, which is focused on the study of topics such as system interconnection, electronic data interchange, product data exchange and distributed computing environments. It is a c ...
frameworks. Solr exposes industry standard
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
REST-like
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s with both
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
and
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
support, and will integrate with any system or programming language supporting these standards. For ease of use there are also client libraries available for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, C#,
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
and most other popular programming languages.


See also

* Open Semantic Framework * Search oriented architecture *
List of information retrieval libraries This is a list of free information retrieval libraries, which are libraries used in software development for performing it retrieval functions. It is not a complete list of such libraries, but is instead a list of free information retrieval lib ...


References


Bibliography

* * * * * * * *


External links

*
Ansible role to install SolrCloud in a Debian environment
{{Apache Software Foundation
Solr Solr (pronounced "solar") is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features ...
Free software programmed in Java (programming language) Free search engine software Search engine software Database-related software for Linux NoSQL