Geographic information retrieval
   HOME

TheInfoList



OR:

Geographic information retrieval (GIR) or geographical information retrieval systems are search tools for searching the Web,
enterprise Enterprise (or the archaic spelling Enterprize) may refer to: Business and economics Brands and enterprises * Enterprise GP Holdings, an energy holding company * Enterprise plc, a UK civil engineering and maintenance company * Enterpris ...
documents, and
mobile local search Mobile local search is a technology that lets people search for local things using mobile equipment such as mobile phones, PDAs, and other mobile devices. Mobile local search satisfies the need to offer a mobile subscriber spontaneous access to n ...
that combine traditional text-based queries with location querying, such as a
map A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
or
placenames Toponymy, toponymics, or toponomastics is the study of ''toponyms'' (proper names of places, also known as place names and geographic names), including their origins, meanings, usage and types. Toponym is the general term for a proper name of ...
. Like traditional
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
systems, GIR systems index text and information from
structured Structuring, also known as smurfing in banking jargon, is the practice of executing financial transactions such as making bank deposits in a specific pattern, calculated to avoid triggering financial institutions to file reports required by law ...
and unstructured
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" or ...
s, and also augment those indices with
geographic information Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position). It is also cal ...
. The development and engineering of GIR systems aims to build systems that can reliably answer queries that include a geographic dimension, such as "What wars were fought in Greece?" or "restaurants in Beirut".
Semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools ...
and
word-sense disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consci ...
are important components of GIR. To identify place names, GIR systems often rely on
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
or other
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
to associate text
documents A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', w ...
with locations. Such
georeferencing Georeferencing means that the internal coordinate system of a map or aerial photo image can be related to a geographic coordinate system. The relevant coordinate transforms are typically stored within the image file ( GeoPDF and GeoTIFF are exam ...
,
geotagging Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata. This data u ...
, and
geoparsing In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place. The places mentioned in digitized text collections constitut ...
tools often need databases of location names, known as
gazetteers A gazetteer is a geographical index or directory used in conjunction with a map or atlas.Aurousseau, 61. It typically contains information concerning the geographical makeup, social statistics and physical features of a country, region, or con ...
.


GIR architecture

GIR involves extracting and resolving the meaning of locations in unstructured text. This is known as
geoparsing In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place. The places mentioned in digitized text collections constitut ...
. After identifying mentions of places and locations in text, a GIR system indexes this information for search and retrieval. GIR systems can commonly be broken down into the following stages:
geoparsing In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place. The places mentioned in digitized text collections constitut ...
, text and geographic indexing, data storage, geographic relevance ranking with respect to a geographic query and browsing results commonly with a map interface. Some GIR systems separate text indexing from geographic indexing, which enables the use of generic database joins, or multi-stage filtering, and others combine them for efficiency. GIR must manage several forms of uncertainty, including
semantic ambiguity In linguistics, an expression is semantically ambiguous when it can have multiple meanings. The higher the amount of synonyms a word has, the higher the degree of ambiguity. Like other kinds of ambiguity, semantic ambiguities are often clarified by ...
of mentions of places in
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
text and position precision.


GIR systems

*
MetaCarta MetaCarta is a software company that developed one of the first search engines to use a map to find unstructured documents. The product uses natural language processing to georeference text for customers in defense, intelligence, homeland securi ...
created and patented one of the first commercial products to offer GIR capabilities. * Frankenplace: a general-purpose geographic search engine. * Web-a-where


Study & Evaluation

The study of GIR systems has a rich history dating back to the 1970s and possibly earlier. See Ray Larson’s book ''Geographic information retrieval and spatial browsing'' for references to much of the pre-
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
literature on GIR. In 2005 the
Cross-Language Evaluation Forum The Conference and Labs of the Evaluation Forum (formerly Cross-Language Evaluation Forum), or CLEF, is an organization promoting research in multilingual information access (currently focusing on European languages). Its specific functions are t ...
added a geographic track, GeoCLEF. GeoCLEF was the first TREC-style evaluation forum for GIR systems and provided participants a chance to compare systems.


Applications

GIR has many applications in
geoweb The concept of a Geospatial Web may have first been introduced by Dr. Charles Herring in his US DoD paper, ''An Architecture of Cyberspace: Spatialization of the Internet'', 1994, U.S. Army Construction Engineering Research Laboratory (). Dr. Her ...
,
neogeography Neogeography (literally "new geography") is the use of geographical techniques and tools for personal and community activities or by a non-expert group of users. Application domains of neogeography are typically not formal or analytical. From the p ...
, and
mobile local search Mobile local search is a technology that lets people search for local things using mobile equipment such as mobile phones, PDAs, and other mobile devices. Mobile local search satisfies the need to offer a mobile subscriber spontaneous access to n ...
and has been a focus of many conferences, including the
ESRI Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
Users Conferences and
O'Reilly O'Reilly ( ga, Ó Raghallaigh) is a group of families, ultimately all of Irish Gaels, Gaelic origin, who were historically the kings of East Bréifne in what is today County Cavan. The clan were part of the Connachta's Uí Briúin Bréifne kin ...
’s Where 2.0 conferences.


References


See also

*
Geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
*
Geoparsing In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place. The places mentioned in digitized text collections constitut ...
*
Information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
*
MetaCarta MetaCarta is a software company that developed one of the first search engines to use a map to find unstructured documents. The product uses natural language processing to georeference text for customers in defense, intelligence, homeland securi ...
*
Semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools ...
*
Search engine (computing) A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called ''hits''. Search engines help to minimize the time requi ...
*
Toponymy Toponymy, toponymics, or toponomastics is the study of ''toponyms'' (proper names of places, also known as place names and geographic names), including their origins, meanings, usage and types. Toponym is the general term for a proper name of ...
Geographic data and information Information retrieval {{comp-ling-stub