Sphinx (search engine)
   HOME

TheInfoList



OR:

Sphinx is a
fulltext In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts ...
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
that provides text search functionality to client applications.


Overview

Sphinx can be used either as a stand-alone server or as a
storage engine A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application ...
("SphinxSE") for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a
DBMS In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
and can communicate with
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
,
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
and
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the ...
through their native protocols or with any ODBC-compliant DBMS via
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
.
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
, a fork of MySQL, is distributed with SphinxSE.


SphinxAPI

If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
and Python languages. Unofficial implementations for other languages, as well as various third party plugins and modules are also available. Other data sources can be indexed via pipe in a custom
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
format.


SphinxQL

The Sphinx search daemon supports the MySQL binary network protocol and can be accessed with the regular MySQL API and/or clients. Sphinx supports a subset of SQL known as SphinxQL. It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.


SphinxSE

Sphinx can also provide a special storage engine for MariaDB and MySQL databases. This allows those MySQL, MariaDB to communicate with Sphinx's searchd to run queries and obtain results. Sphinx indices are treated like regular SQL tables. The SphinxSE storage engine is shipped with MariaDB.


Full-text fields and indexing

Sphinx is configured to examine a data set via its Indexer. The Indexer process creates a full-text index (a special
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, ...
that enables quick keyword searches) from the given data/text. Full-text fields are the resulting content that is indexed by Sphinx; they can be (quickly) searched for keywords. Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. to "title" and "abstract" only). Sphinx's index format generally supports up to 256 fields. Note that the original data is not stored in the Sphinx index, but are discarded during the Indexing process; Sphinx assumes that you store those contents elsewhere.


Attributes

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. Attributes are named. Attribute names are case insensitive. Attributes are not full-text indexed; they are stored in the index as is. Currently supported attribute types are: *
unsigned integers In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are ...
(1-bit to 32-bit wide); *
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
timestamps A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
; * floating point values (32-bit, IEEE 754 single precision); * string ordinals (specially computed integers); * strings (since 1.10-beta); *
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other se ...
(since 2.1.1-beta); * MVA, multi-value attributes (variable-length lists of 32-bit unsigned integers).


JSON attributes in Sphinx

Sphinx, like classic SQL
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, works with a so-called fixed
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
, that is, a set of predefined attribute columns. These work well when most of the data stored actually has values: mapping sparse data to static columns can be cumbersome. Assume for example that you're running a price comparison or an auction site with many different products categories. Some of the attributes like the price or the vendor are identical across all goods. But from there, for laptops, you also need to store the weight, screen size, HDD type, RAM size, etc. And, say, for shovels, you probably want to store the color, the handle length, and so on. So it's manageable across a single category, but all the distinct fields that you need for all the goods across all the categories are legion. The JSON field can be used to overcome this. Inside the JSON attribute you don't need a fixed structure. You can have various keys which may or may not be present in all documents. When you try to filter on one of these keys, Sphinx will ignore documents that don't have the key in the JSON attribute and will work only with those documents that have it.


License

Up until version 3, Sphinx is
dual licensed Multi-licensing is the practice of distributing software under two or more different sets of terms and conditions. This may mean multiple different software licenses or sets of licenses. Prefixes may be used to indicate the number of licens ...
; either: #
GNU General Public License version 2 The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
or #
proprietary licensing {{Short pages monitor