HOME

TheInfoList



OR:

The Text REtrieval Conference (TREC) is an ongoing series of
workshop Beginning with the Industrial Revolution era, a workshop may be a room, rooms or building which provides both the area and tools (or machinery) that may be required for the manufacture or repair of manufactured goods. Workshops were the only ...
s focusing on a list of different
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
(IR) research areas, or ''tracks.'' It is co-sponsored by the
National Institute of Standards and Technology The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
(NIST) and the
Intelligence Advanced Research Projects Activity The Intelligence Advanced Research Projects Activity (IARPA) is an organization within the Office of the Director of National Intelligence responsible for leading research to overcome difficult challenges relevant to the United States Intellige ...
(part of the office of the
Director of National Intelligence The director of national intelligence (DNI) is a senior, cabinet-level United States government official, required by the Intelligence Reform and Terrorism Prevention Act of 2004 to serve as executive head of the United States Intelligence Commu ...
), and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale ''evaluation'' of
text retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. Use ...
methodologies and to increase the speed of lab-to-product
transfer of technology Technology transfer (TT), also called transfer of technology (TOT), is the process of transferring (disseminating) technology from the person or organization that owns or holds it to another person or organization, in an attempt to transform invent ...
. TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009."
Hal Varian Hal Ronald Varian (born March 18, 1947 in Wooster, Ohio) is Chief Economist at Google and holds the title of emeritus professor at the University of California, Berkeley where he was founding dean of the School of Information. Varian is an econom ...
the Chief Economist at
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
wrote that "The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field." Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable
features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...
. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA (US Defense Advanced Research Project) and run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies.


Goals

* Encourage retrieval search based on large text collections * Increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas * Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems * To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents .NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences.


Relevance judgments in TREC

TREC uses binary relevance criterion that is either the document is relevant or not relevant. Since size of TREC collection is large, it is impossible to calculate the absolute recall for each query. In order to assess the relevance of documents in relation to a query, TREC uses a specific method call pooling for calculating relative recall. All the relevant documents that occurred in the top 100 documents for each system and for each query are combined to produce a pool of relevant documents. Recall being the proportion of the pool of relevant documents that a single system retrieved for a query topic.


Various TRECs

In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections .Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach. TREC2 Took place in August 1993. 31 group of researchers participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’ query and retrieval using a ‘routing' query In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases TREC-4 they made even shorter to investigate the problems with very short user statements TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries TREC-9 Includes seven tracks In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video In TREC-11 Novelty tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system TREC-12 held in 2003 added three new tracks; Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents)


Tracks


Current tracks

''New tracks are added as new research needs are identified, this list is current for TREC 2018.''
CENTRE Track
- Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol (new track for 2018).
Common Core Track
- Goal: an ad hoc search task over news documents.
Complex Answer Retrieval (CAR)
- Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus.
Incident Streams Track
- Goal: to research technologies to automatically process social media streams during emergency situations (new track for TREC 2018).
The News Track
- Goal: partnership with
The Washington Post ''The Washington Post'' (also known as the ''Post'' and, informally, ''WaPo'') is an American daily newspaper published in Washington, D.C. It is the most widely circulated newspaper within the Washington metropolitan area and has a large nati ...
to develop test collections in news environment (new for 2018).
Precision Medicine Track
- Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials.
Real-Time Summarization Track (RTS)
- Goal: to explore techniques for real-time update summaries from social media streams.


Past tracks

* Chemical Track - Goal: to develop and evaluate technology for large scale search in
chemistry Chemistry is the science, scientific study of the properties and behavior of matter. It is a natural science that covers the Chemical element, elements that make up matter to the chemical compound, compounds made of atoms, molecules and ions ...
-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically
patent search Prior art (also known as state of the art or background art) is a concept in patent law used to determine the patentability of an invention, in particular whether an invention meets the novelty and the inventive step or non-obviousness criteria f ...
ers and chemists.
Clinical Decision Support Track
- Goal: to investigate techniques for linking medical cases to information relevant for patient care
Contextual Suggestion Track
- Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests. *
Crowdsourcing Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
Track - Goal: to provide a collaborative venue for exploring
crowdsourcing Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
methods both for evaluating search and for performing search tasks. * Genomics Track - Goal: to study the retrieval of
genomic Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007.
Dynamic Domain Track
- Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains. * Enterprise Track - Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008. *
Entity An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually ...
Track - Goal: to perform entity-related search on Web data. These search tasks (such as finding entities and properties of entities) address common information needs that are not that well modeled as ad hoc document search. * Cross-Language Track - Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into
CLEF A clef (from French: 'key') is a Musical notation, musical symbol used to indicate which Musical note, notes are represented by the lines and spaces on a musical staff (music), stave. Placing a clef on a stave assigns a particular pitch to ...
. * FedWeb Track - Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top. * Federated Web Search Track - Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services. * Filtering Track - Goal: to binarily decide retrieval of new incoming documents given a stable
information need The term information need is often understood as an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need. Rarely mentioned in general literature about needs, it is a common term in information sc ...
. * HARD Track - Goal: to achieve High Accuracy Retrieval from Documents by leveraging additional information about the searcher and/or the search context. * Interactive Track - Goal: to study user
interaction Interaction is action that occurs between two or more objects, with broad use in philosophy and the sciences. It may refer to: Science * Interaction hypothesis, a theory of second language acquisition * Interaction (statistics) * Interactions o ...
with text retrieval systems.
Knowledge Base Acceleration (KBA)
Track - Goal: to develop techniques to dramatically improve the efficiency of (human) knowledge base curators by having the system suggest modifications/extensions to the KB based on its monitoring of the data streams, created th

organized by Diffeo. * Legal Track - Goal: to develop search technology that meets the needs of lawyers to engage in effective
discovery Discovery may refer to: * Discovery (observation), observing or finding something unknown * Discovery (fiction), a character's learning something unknown * Discovery (law), a process in courts of law relating to evidence Discovery, The Discovery ...
in
digital document An electronic document is any electronic media content (other than computer programs or system files) that is intended to be used in either an electronic form or as printed output. Originally, any computer data were considered as something inter ...
collections.
LiveQA Track
- Goal: to generate answers to real questions originating from real users via a live question stream, in real time. * Medical Records Track - Goal: to explore methods for searching unstructured information found in patient medical records. *
Microblog Microblogging is a form of social network that permits only short posts. They "allow users to exchange small elements of content such as short sentences, individual images, or video links",. Retrieved June 5, 2014 which may be the major reason for ...
Track - Goal: to examine the nature of real-time information needs and their satisfaction in the context of microblogging environments such as Twitter. *
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
Track - Goal: to examine how specific tools developed by computational linguists might improve retrieval. * Novelty Track - Goal: to investigate systems' abilities to locate new (i.e., non-redundant) information. * OpenSearch Track - Goal: to explore an evaluation paradigm for IR that involves real users of operational search engines. For first year of the track the task was ad hoc Academic Search. *
Question Answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
Track - Goal: to achieve more
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
than just
document retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly natural language, unstructured text, such as newspaper articles, real estate records or paragraphs ...
by answering factoid, list and definition-style questions. * Real-Time Summarization Track - Goal: to explore techniques for constructing real-time update summaries from social media streams in response to users' information needs. * Robust Retrieval Track - Goal: to focus on individual topic effectiveness. * Relevance Feedback Track - Goal: to further deep evaluation of relevance feedback processes. * Session Track - Goal: to develop methods for measuring multiple-query sessions where information needs drift or get more or less specific over the session. *
Spam Spam may refer to: * Spam (food), a canned pork meat product * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ** Messaging spam, spam targeting users of instant messaging ( ...
Track - Goal: to provide a standard evaluation of current and proposed
spam filter Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly appl ...
ing approaches.
Tasks Track
- Goal: to test whether systems can induce the possible tasks users might be trying to accomplish given a query. * Temporal Summarization Track - Goal: to develop systems that allow users to efficiently monitor the information associated with an event over time. *
Terabyte The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
Track - Goal: to investigate whether/how the IR community can scale traditional IR test-collection-based evaluation to significantly large collections.
Total Recall Track
- Goal:: to evaluate methods to achieve very high recall, including methods that include a human assessor in the loop. *
Video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) syste ...
Track - Goal: to research in automatic segmentation,
index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...
ing, and content-based retrieval of
digital video Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises ...
. In 2003, this track became its own independent evaluation named TRECVID * Web Track - Goal: to explore information seeking behaviors common in general web search.


Related events

In 1997, a Japanese counterpart of TREC was launched (first workshop in 1999), calle
NTCIR
( NII Test Collection for IR Systems), and in 2000,
CLEF A clef (from French: 'key') is a Musical notation, musical symbol used to indicate which Musical note, notes are represented by the lines and spaces on a musical staff (music), stave. Placing a clef on a stave assigns a particular pitch to ...
, a European counterpart, specifically vectored towards the study of cross-language information retrieval was launched. Forum for Information Retrieval Evaluatio
(FIRE)
started in 2008 with the aim of building a South Asian counterpart for TREC, CLEF, and NTCIR,


Conference contributions to search effectiveness

NIST claims that within the first six years of the workshops, the effectiveness of retrieval systems approximately doubled. The conference was also the first to hold large-scale evaluations of non-English documents, speech, video and retrieval across languages. Additionally, the challenges have inspired a large body o

Technology first developed in TREC is now included in many of the world's commercial
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s. An independent report by RTII found that "about one-third of the improvement in web search engines from 1999 to 2009 is attributable to TREC. Those enhancements likely saved up to 3 billion hours of time using web search engines. ... Additionally, the report showed that for every $1 that NIST and its partners invested in TREC, at least $3.35 to $5.07 in benefits were accrued to U.S. information retrieval researchers in both the private sector and academia." While one study suggests that the state of the art for ad hoc search did not advance substantially in the decade preceding 2009, it is referring just to search for topically relevant documents in small news and web collections of a few gigabytes. There have been advances in other types of ad hoc search. For example, test collections were created for known-item web search which found improvements from the use of anchor text, title weighting and url length, which were not useful techniques on the older ad hoc test collections. In 2009, a new billion-page web collection was introduced, and spam filtering was found to be a useful technique for ad hoc web search, unlike in past test collections. The test collections developed at TREC are useful not just for (potentially) helping researchers advance the state of the art, but also for allowing developers of new (commercial) retrieval products to evaluate their effectiveness on standard tests. In the past decade, TREC has created new tests for enterprise e-mail search, genomics search, spam filtering, e-Discovery, and several other retrieval domains. TREC systems often provide a baseline for further research. Examples include: *
Hal Varian Hal Ronald Varian (born March 18, 1947 in Wooster, Ohio) is Chief Economist at Google and holds the title of emeritus professor at the University of California, Berkeley where he was founding dean of the School of Information. Varian is an econom ...
, Chief Economist at
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
, says ''Better data makes for better science. The history of information retrieval illustrates this principle well," and describes TREC's contribution. * TREC's Legal track has influenced the e-Discovery community both in research and in evaluation of commercial vendors. * The IBM researcher team building
IBM Watson IBM Watson is a question-answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's founder ...
(aka
DeepQA IBM Watson is a question-answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's found ...
), which beat the world's best
Jeopardy! ''Jeopardy!'' is an American game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead given genera ...
players, used data and systems from TREC's QA Track as baseline performance measurements.


Participation

The conference is made up of a varied, international group of researchers and developers. In 2003, there were 93 groups from both academia and industry from 22 countries participating.


See also

*
List of computer science awards This list of computer science awards is an index to articles on notable awards related to computer science. It includes lists of awards by the Association for Computing Machinery, the Institute of Electrical and Electronics Engineers, other comput ...


References


External links


TREC website at NISTTIPSTERThe TREC book (at Amazon)
{{Authority control Information retrieval organizations Computational linguistics Natural language processing Computer science competitions