Poliqarp
   HOME

TheInfoList



OR:

Poliqarp is an open source search engine designed to process
text corpora In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical ...
, among others the
National Corpus of Polish The National Corpus of Polish (Polish : Narodowy Korpus Języka Polskiego NKJP) is the biggest and the most important corpus of the Polish language. A linguistic corpus is a collection of texts where one can find the typical use of a single word or ...
created at the Institute of Computer Science,
Polish Academy of Sciences The Polish Academy of Sciences ( pl, Polska Akademia Nauk, PAN) is a Polish state-sponsored institution of higher learning. Headquartered in Warsaw, it is responsible for spearheading the development of science across the country by a society o ...
.


Features

* Custom
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
* Two-level
regular expressions A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" o ...
: ** operating at the level of characters in words ** operating at the level of words in statements/paragraphs * Good performance * Compact corpus representation (compared to similar projects) * Portability across operating systems:
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
/
BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...
/
Win32 The Windows API, informally WinAPI, is Microsoft's core set of application programming interfaces (APIs) available in the Microsoft Windows operating systems. The name Windows API collectively refers to several different platform implementations th ...
* Lack of portability across
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
(current release works only on little endian devices)


References

{{reflist


External links


Polish corpus website (in English)

Project website on SourceForge

Search plugin for Firefox
Information retrieval systems