GIS file formats
   HOME

TheInfoList



OR:

A GIS file format is a standard for encoding
geographical information Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position). It is also call ...
into a
computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and transfe ...
, as a specialized type of
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
for use in
geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
s (GIS) and other geospatial applications. Since the 1970s, dozens of formats have been created based on various data models for various purposes. They have been created by government mapping agencies (such as the
USGS The United States Geological Survey (USGS), formerly simply known as the Geological Survey, is a scientific agency of the United States government. The scientists of the USGS study the landscape of the United States, its natural resources, a ...
or
National Geospatial-Intelligence Agency The National Geospatial-Intelligence Agency (NGA) is a combat support agency within the United States Department of Defense whose primary mission is collecting, analyzing, and distributing geospatial intelligence (GEOINT) in support of national ...
), GIS software vendors, standards bodies such as the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
, informal user communities, and even individual developers.


History

The first GIS installations of the 1960s, such as the
Canada Geographic Information System {{Unreferenced, date=October 2012 The Canada Geographic Information System (CGIS) was an early geographic information system (GIS) developed for the Government of Canada beginning in the early 1960s. CGIS was used to store geospatial data for t ...
were based on bespoke software and stored data in bespoke file structures designed for the needs of the particular project. As more of these appeared, they could be compared to find best practices and common structures. When general-purpose GIS software was developed in the 1970s and early 1980s, including programs from academic labs such as the
Harvard Laboratory for Computer Graphics and Spatial Analysis The Harvard Laboratory for Computer Graphics and Spatial Analysis (1965 to 1991) pioneered early cartographic and architectural computer applications that led to integrated geographic information systems (GIS). Some of the Laboratory's influenti ...
, government agencies (e.g., the
Map Overlay and Statistical System {{No footnotes, date=August 2011 The Map Overlay and Statistical System (MOSS), is a GIS software technology. Development of MOSS began in late 1977 and was first deployed for use in 1979. MOSS represents a very early public domain, open source GIS ...
(MOSS) developed by the U.S. Fish & Wildlife Service and
Bureau of Land Management The Bureau of Land Management (BLM) is an agency within the United States Department of the Interior responsible for administering federal lands. Headquartered in Washington DC, and with oversight over , it governs one eighth of the country's la ...
), and new GIS software companies such as
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
and
Intergraph Intergraph Corporation was an American software development and services company, which now forms part of Hexagon AB. It provides enterprise engineering and geospatially powered software to businesses, governments, and organizations around the w ...
, each program was built around its own proprietary (and often secret) file format. Since each GIS installation was effectively isolated from all others, interchange between them was not a major consideration. By the early 1990s, the proliferation of GIS worldwide, and an increasing need for sharing data, soon accelerated by the emergence of the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
and spatial data infrastructures, led to the need for interoperable data and standard formats. An early attempt at standardization was the U.S.
Spatial Data Transfer Standard Spatial Data Transfer Standard, or SDTS, is a standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventio ...
, released in 1994 and designed to encode the wide variety of federal government data. Although this particular format failed to garner widespread support, it led to other standardization efforts, especially the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
(OGC), which has developed or adopted several vendor-neutral standards, some of which have been adopted by the
International Standards Organization The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Art ...
(ISO). Another development in the 1990s was the public release of proprietary file formats by GIS software vendors, enabling them to be used by other software. The most notable example of this was the publication of the Esri
Shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
format, which by the late 1990s had become the most popular ''de facto'' standard for data sharing by the entire geospatial industry. When proprietary formats were not shared (for example, the ESRI ARC/INFO coverage), software developers frequently reverse-engineered them to enable import and export in other software, further facilitating data exchange. One result of this was the emergence of
free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
libraries A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
, such as the Geospatial Data Abstraction Library (GDAL), which have greatly facilitated the integration of spatial data in any format into a variety of software. During the 2000s, the need for specialized spatial files was reduced somewhat by the emergence of
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s, which incorporated spatial data into general-purpose relational databases. However, new file formats have continued to appear, especially with the proliferation of web mapping; formats such as the
Keyhole Markup Language Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Key ...
(KML) and
GeoJSON GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JSON format. The features include points (therefore addresses and locations), line strings ( ...
can be more easily integrated into web development languages than traditional GIS files.


Format characteristics

Over a hundred distinct formats have been created for the storage of spatial data, of which 20-30 are currently in common usage for different purposes. These can be distinguished in a number of ways: * ''Open'' formats are developed collectively by a community and are available for anyone to implement and contribute improvements, while ''Proprietary'' formats have been developed by a software company for use only in their own software and are generally maintained as a trade secret (although they are often reverse-engineered by others). A third category between these would include formats that are owned exclusively by one company or organization, but are published and available for implementation by anyone, such as the Esri
Shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
. * Some file formats are ''
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating ...
s'' that can be read by humans (such as those based on
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
or
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
), especially those intended for data exchange, while others are ''
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
s'', most commonly those designed for native use in GIS software. * ''Inherently spatial'' formats were designed specifically for storing geographic data, while others are ''spatial extensions'' to formats designed for a more general use (e.g., GeoTIFF,
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s). * Many data formats incorporate some form of ''
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
'', especially raster files. Generally, lossless compression methods are preferable over
lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
methods, because the original data values need to be retrieved.


Raster formats

Like any digital image, raster GIS data is based on a regular tessellation of space into a rectangular grid of rows and columns of ''cells'' (also known as
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
s), with each cell having a measured value stored. The major difference from a photograph is that the grid is registered to geographic space rather than a field of view. The
resolution Resolution(s) may refer to: Common meanings * Resolution (debate), the statement which is debated in policy debate * Resolution (law), a written motion adopted by a deliberative body * New Year's resolution, a commitment that an individual mak ...
of the raster data set is its cell width in ground units. Because a grid is a sample of a continuous space, raster data is most commonly used to represent geographic fields, in which a property varies continuously or discretely over space. Common examples include
remote sensing Remote sensing is the acquisition of information about an object or phenomenon without making physical contact with the object, in contrast to in situ or on-site observation. The term is applied especially to acquiring information about Earth ...
imagery, terrain/elevation,
population density Population density (in agriculture: standing stock or plant density) is a measurement of population per unit land area. It is mostly applied to humans, but sometimes to other living organisms too. It is a key geographical term.Matt RosenberPopul ...
, weather and climate, soil properties, and many others. Raster data can be images with each pixel (or cell) containing a color value. The value recorded for each cell may be of any
level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
, including a discrete qualitative value, such as land use type, or a continuous quantitative value, such as temperature, or a
null Null may refer to: Science, technology, and mathematics Computing * Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value * Null character, the zero-valued ASCII character, also designated by , often use ...
value if no data is available. While a raster cell stores a single value, it can be extended by using raster bands to represent RGB (red, green, blue) colors, colormaps (a mapping between a thematic code and RGB value), or an extended attribute table with one row for each unique cell value. It can also be used to represent discrete
Geographic feature A feature (also called an object or entity), in the context of geography and geographic information science, is a discrete phenomenon that exists at a location in the space and scale of relevance to geography; that is, at or near the surface of Ea ...
s, but usually only in exigent circumstances. Raster data is stored in various formats; from a standard file-based structure of TIFF, JPEG, etc. to
binary large object A binary large object (BLOB or blob) is a collection of binary data stored as a single entity. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob. They can exist as persistent ...
(BLOB) data stored directly in a
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
(RDBMS) similar to other vector-based feature classes. Database storage, when properly indexed, typically allows for quicker retrieval of the raster data but can require storage of millions of significantly sized records.


Raster format examples

*ADRG –
National Geospatial-Intelligence Agency The National Geospatial-Intelligence Agency (NGA) is a combat support agency within the United States Department of Defense whose primary mission is collecting, analyzing, and distributing geospatial intelligence (GEOINT) in support of national ...
(NGA)'s ARC Digitized Raster Graphics *
Binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
– An unformatted file consisting of raster data written in one of several
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s, where multiple band are stored in BSQ (band sequential), BIP (band interleaved by pixel) or BIL (band interleaved by line). Georeferencing and other metadata are stored one or more
sidecar file Sidecar files, also known as buddy files or connected files, are computer files that store data (often metadata) which is not supported by the format of a source file. There may be one or more sidecar files for each source file. There may also be ...
s. * Digital raster graphic (DRG) – digital scan of a paper
USGS The United States Geological Survey (USGS), formerly simply known as the Geological Survey, is a scientific agency of the United States government. The scientists of the USGS study the landscape of the United States, its natural resources, a ...
topographic map In modern mapping, a topographic map or topographic sheet is a type of map characterized by large- scale detail and quantitative representation of relief features, usually using contour lines (connecting points of equal elevation), but historic ...
*ECRG –
National Geospatial-Intelligence Agency The National Geospatial-Intelligence Agency (NGA) is a combat support agency within the United States Department of Defense whose primary mission is collecting, analyzing, and distributing geospatial intelligence (GEOINT) in support of national ...
(NGA)'s Enhanced Compressed ARC Raster Graphics (better resolution than CADRG and no color loss) *
ECW ECW may refer to: Professional wrestling * Extreme Championship Wrestling (originally Eastern Championship Wrestling), a professional wrestling promotion that operated from 1992 to 2001 * The Alliance (professional wrestling) (originally the WCW/E ...
– Enhanced Compressed Wavelet (from ERDAS). A compressed wavelet format, often lossy. *
Esri grid An Esri grid is a raster GIS file format developed by Esri, which has two formats: #A proprietary binary format, also known as an ''ARC/INFO GRID'', ''ARC GRID'' and ''many'' other variations #A non-proprietary ASCII format, also known as an ''AR ...
– proprietary
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that t ...
raster format used by
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
since the mid-1980s * GeoTIFF
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
variant enriched with GIS relevant metadata, especially
georeferencing Georeferencing means that the internal coordinate system of a map or aerial photo image can be related to a geographic coordinate system. The relevant coordinate transforms are typically stored within the image file ( GeoPDF and GeoTIFF are exam ...
. An open format that has become one of the most common formats for data sharing. *IMG –
ERDAS IMAGINE Hexagon AB is a publicly listed global information technology company specializing in hardware and software digital reality that was founded in 1992 and headquartered in Stockholm, Sweden. Hexagon's B share is listed on the list of large companie ...
image file format *
JPEG2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding the ...
– Open-source raster format. A compressed format, allows both lossy and lossless compression. *
MrSID MrSID (pronounced Mister Sid) is an acronym that stands for ''multiresolution seamless image database''. It is a file format (filename extension ''.sid'') developed and patented by LizardTech (in October 2018 absorbed into Extensis) for encoding of ...
– Multi-Resolution Seamless Image Database (by Lizardtech). A compressed wavelet format, allows both lossy and lossless compression. *
netCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata ...
-CF – netCDF file format with CF medata conventions for earth science data. Binary storage in open format with optional compression. Allows for direct web-access of subsets/aggregations of maps through
OPeNDAP OPeNDAP is an acronym for "Open-source Project for a Network Data Access Protocol," an endeavor focused on enhancing the retrieval of remote, structured data through a Web-based architecture and a discipline-neutral Data Access Protocol (DAP). Widel ...
protocol. *RPF – Raster Product Format, military file format specified in MIL-STD-2411 **CADRG – Compressed ADRG, developed by NGA, nominal compression of 55:1 over ADRG (type of Raster Product Format) ** CIB – Controlled Image Base, developed by NGA (type of Raster Product Format) *
USGS DEM The USGS DEM standard is a geospatial file format developed by the United States Geological Survey for storing a raster-based digital elevation model. It is an open standard, and is used throughout the world. It has been superseded by the USGS's o ...
– The
USGS The United States Geological Survey (USGS), formerly simply known as the Geological Survey, is a scientific agency of the United States government. The scientists of the USGS study the landscape of the United States, its natural resources, a ...
' Digital Elevation Model **
GTOPO30 GTOPO30 is a digital elevation model for the world, developed by United States Geological Survey (USGS). It has a 30-arc second resolution (approximately 1 km), and is split into 33 tiles stored in the USGS DEM file format. According to DTED ...
– Large complete Earth elevation model at 30 arc seconds, delivered in the USGS DEM format *
DTED DTED (or Digital Terrain Elevation Data) is a standard of digital datasets which consists of a matrix of terrain elevation values, i.e., a Digital Elevation Model. This standard was originally developed in the 1970s to support aircraft radar simula ...
National Geospatial-Intelligence Agency The National Geospatial-Intelligence Agency (NGA) is a combat support agency within the United States Department of Defense whose primary mission is collecting, analyzing, and distributing geospatial intelligence (GEOINT) in support of national ...
(NGA)'s Digital Terrain Elevation Data, the military standard for elevation data * World file
Georeferencing Georeferencing means that the internal coordinate system of a map or aerial photo image can be related to a geographic coordinate system. The relevant coordinate transforms are typically stored within the image file ( GeoPDF and GeoTIFF are exam ...
a raster image file (e.g. JPEG, BMP)


Vector formats

A ''vector'' dataset (sometimes called a ''feature'' dataset) stores information about discrete objects, using an encoding of the vector logical data model to represent the location or ''geometry'' of each object, and an encoding of its other properties that is usually based on
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
technology. Typically, a single dataset collects information about a set of closely related or similar objects, such as all of the roads in a city. The Vector data model uses
coordinate geometry In classical mathematics, analytic geometry, also known as coordinate geometry or Cartesian geometry, is the study of geometry using a coordinate system. This contrasts with synthetic geometry. Analytic geometry is used in physics and engineerin ...
to represent each shape as one of several
geometric primitive In vector computer graphics, CAD systems, and geographic information systems, geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that ...
s, most commonly '' points'' (a single coordinate of zero
dimension In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...
), ''
lines Line most often refers to: * Line (geometry), object with zero thickness and curvature that stretches to infinity * Telephone line, a single-user circuit on a telephone communication system Line, lines, The Line, or LINE may also refer to: Arts ...
'' (a one-dimensional ordered list of coordinates connected by straight lines), and ''
polygon In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two toge ...
s'' (a self-closing boundary line enclosing a two-dimensional region). Many data structures have been developed to encode these primitives as digital data, but most modern vector file formats are based on the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
(OGC)
Simple Features Simple Features (officially Simple Feature Access) is a set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries (point, line, polygon, multi-point, multi-line, etc.) used by g ...
specification, often directly incorporating its Well-known text (WKT) or Well-known binary (WKB) encodings. In addition to the geometry of each object, a vector dataset must also be able to store its ''attributes''. For example, a database that describes lakes may contain each lake's depth, water quality, and pollution level. Since the 1970s, almost all vector file formats have adopted the
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
model, either in principle or directly incorporating
RDBMS A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relation ...
software. Thus, the entire dataset is stored in a ''table'', with each ''row'' representing a single object that contains ''columns'' for each attribute. Two strategies have been used to integrate the geometry and attributes into a single vector file format structure: * A '' georelational format'' stores them as two separate files, with the geometry and attributes of each object being linked by file ordering or a
primary key In the relational model of databases, a primary key is a ''specific choice'' of a ''minimal'' set of attributes (Column (database), columns) that uniquely specify a tuple (Row (database), row) in a Relation (database), relation (Table (database), t ...
. This was most common from the 1970s through the early 1990s, because GIS software developers had to invent their own geometry data structures, but incorporated existing relational database file formats for the attributes. For example, the
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
Shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
format includes the .dbf file from the DOS
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
software. * The ''Object-based model'' stores them in a single structure, loosely or directly based on the objects in
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of pr ...
languages. This is the basis of most modern file formats, including
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s that include a geometry column along with the other attributes in a single relational table. Other formats, such as
GeoJSON GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JSON format. The features include points (therefore addresses and locations), line strings ( ...
, use different structures for geometry and attributes, but combine them for each object in the same file.
Geospatial topology Geospatial topology is the study and application of qualitative spatial relationships between geographic features, or between representations of such features in geographic information, such as in geographic information systems (GIS). For examp ...
is often an important part of vector data, representing the inherent spatial relationships (especially adjacency) between objects. Topology has been managed in vector file formats in four ways. In a ''topological data structure'', most notably Harvard's POLYVRT and is successor the
ARC/INFO ArcInfo (formerly ARC/INFO) is a full-featured geographic information system produced by Esri, and is the highest level of licensing (and therefore functionality) in the ArcGIS Desktop product line. It was originally a command-line based system. T ...
coverage, topological connections between points, lines, and polygons are an inherent part of the encoding of those features. Conversely, non-topological or ''spaghetti data'' (such as the Esri
Shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
and most
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s) includes no topology information, with each geometry being completely independent of all others. A ''topology dataset'' (often used in
network analysis Network analysis can refer to: * Network theory, the analysis of relations through mathematical graphs ** Social network analysis, network theory applied to social relations * Network analysis (electrical circuits) See also *Network planning and ...
) augments spaghetti data with a separate file encoding the topological connections. A ''topology rulebase'' is a list of desired topology rules used to enforce spatial integrity in spaghetti data, such as "county polygons must not overlap" and "state polygons must share boundaries with county polygons." Vector datasets usually represent discrete
geographical feature A feature (also called an object or entity), in the context of geography and geographic information science, is a discrete phenomenon that exists at a location in the space and scale of relevance to geography; that is, at or near the surface of E ...
s, such as people, buildings, trees, and counties. However, they may also be used to represent geographical fields by storing locations where the spatially continuous field has been sampled. Sample points (e.g.,
weather stations A weather station is a facility, either on land or sea, with instruments and equipment for measuring atmospheric conditions to provide information for weather forecasts and to study the weather and climate. The measurements taken include tempera ...
and
sensor networks Wireless sensor networks (WSNs) refer to networks of spatially dispersed and dedicated sensors that monitor and record the physical conditions of the environment and forward the collected data to a central location. WSNs can measure environmental c ...
),
Contour line A contour line (also isoline, isopleth, or isarithm) of a function of two variables is a curve along which the function has a constant value, so that the curve joins points of equal value. It is a plane section of the three-dimensional grap ...
s and
triangulated irregular network In computer graphics, a triangulated irregular network (TIN) is a representation of a continuous surface consisting entirely of triangular facets (a triangle mesh), used mainly as Discrete Global Grid in primary elevation modeling. The vertic ...
s (TIN) are used to represent elevation or other values that change continuously over space. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface.


Example vector file formats

Formats commonly in current usage: *
Shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
– a popular vector data GIS format, developed by
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
*
Geography Markup Language The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic ...
(GML) – XML based open standard (by
OpenGIS The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 19 ...
) for GIS data exchange *
GeoJSON GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JSON format. The features include points (therefore addresses and locations), line strings ( ...
– a lightweight format based on
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
, used by many open source GIS packages *
GeoMedia Hexagon Geospatial's (a division of Intergraph Corporation) GeoMedia Professional is a geographic information system (GIS) management solution for map generation and the analysis of geographic information with smart tools that capture and edit ...
Intergraph Intergraph Corporation was an American software development and services company, which now forms part of Hexagon AB. It provides enterprise engineering and geospatially powered software to businesses, governments, and organizations around the w ...
's
Microsoft Access Microsoft Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools (not to be confused with the old Microsoft Access w ...
based format for spatial vector storage *
Keyhole Markup Language Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Key ...
(KML) – XML based open standard (by
OpenGIS The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 19 ...
) for GIS data exchange * MapInfo TAB formatMapInfo's vector data format using TAB, DAT, ID and MAP files *
Measure Map Pro format , type code = , uniform_type = , conforms_to = , magic = , developer = Blue Blink One , released = , latest_release_version = , latest_release_date = , genre = GIS file formats, GIS file format , container_for = , co ...
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
data format to store GIS data * National Transfer Format (NTF) – National Transfer Format (mostly used by the UK Ordnance Survey) *
Spatialite SpatiaLite is a spatial extension to SQLite, providing vector geodatabase functionality. It is similar to PostGIS, Oracle Spatial, and SQL Server with spatial extensions, although SQLite/SpatiaLite aren't based on client-server architecture: they ...
– a spatial extension to
SQLite SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most ...
, providing vector geodatabase functionality. It is similar to
PostGIS PostGIS ( ) is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC). Technicall ...
,
Oracle Spatial Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, po ...
, and SQL Server with spatial extensions *
Simple Features Simple Features (officially Simple Feature Access) is a set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries (point, line, polygon, multi-point, multi-line, etc.) used by g ...
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
specification for vector data ** Well-known text (WKT) – A text markup language for representing feature geometry, developed by
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
** Well-known binary (WKB) – Binary version of well-known text, used in many
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s * SOSI – a spatial data format used for all public exchange of spatial data in Norway *
AutoCAD DXF AutoCAD DXF (Drawing Interchange Format, or Drawing Exchange Format) is a CAD data file format developed by Autodesk for enabling data interoperability between AutoCAD and other programs. DXF was introduced in December 1982 as part of AutoCAD ...
– data transfer format for
AutoCAD AutoCAD is a commercial computer-aided design (CAD) and drafting software application. Developed and marketed by Autodesk, AutoCAD was first released in December 1982 as a desktop app running on microcomputers with internal graphics controllers. ...
data (by
Autodesk Autodesk, Inc. is an American multinational software corporation that makes software products and services for the architecture, engineering, construction, manufacturing, media, education, and entertainment industries. Autodesk is headquartered ...
) *
Geographic Data Files Geographic Data Files (GDF) is an interchange file format for geographic data. In contrast with generic GIS formats, GDF provides detailed rules for data capture and representation, and an extensive catalog of standard features, attributes and rela ...
(GDF) — An interchange file format for geographic data Historical formats seldom used today: *
ArcInfo ArcInfo (formerly ARC/INFO) is a full-featured geographic information system produced by Esri, and is the highest level of licensing (and therefore functionality) in the ArcGIS Desktop product line. It was originally a command-line based system. T ...
Coverage - topological data structure used in Arc/INFO from 1981 through 2000 *
Esri TIN The Esri TIN format is a popular yet proprietary geospatial vector data format for geographic information system (GIS) software for storing elevation data as a triangulated irregular network. It is developed and regulated by Esri, US. The Esri ...
– proprietary
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that t ...
format for
triangulated irregular network In computer graphics, a triangulated irregular network (TIN) is a representation of a continuous surface consisting entirely of triangular facets (a triangle mesh), used mainly as Discrete Global Grid in primary elevation modeling. The vertic ...
data used by
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
*
Digital line graph A Digital Line Graph (DLG) is a cartographic map feature represented in digital vector form that is distributed by the U.S. Geological Survey (USGS). DLGs are collected from USGS maps and are distributed in large, intermediate and small scale w ...
(DLG) – a USGS format for vector data *
TIGER The tiger (''Panthera tigris'') is the largest living cat species and a member of the genus '' Panthera''. It is most recognisable for its dark vertical stripes on orange fur with a white underside. An apex predator, it primarily preys on u ...
– Topologically Integrated Geographic Encoding and Referencing *
Vector Product Format Vector Product Format (VPF) is a military standard for vector-based digital map products produced by the U.S. Department of Defense (DOD). It has been adopted as part of the Digital Geographic Exchange Standard (DIGEST) in the form of Vector Rel ...
(VPF) –
National Geospatial-Intelligence Agency The National Geospatial-Intelligence Agency (NGA) is a combat support agency within the United States Department of Defense whose primary mission is collecting, analyzing, and distributing geospatial intelligence (GEOINT) in support of national ...
(NGA)'s format of vectored data for large geographic databases *
Spatial Data File The Spatial Data File (SDF) is a single-user geodatabase file format developed by Autodesk. The file format is the native spatial data storage format for Autodesk GIS programs MapGuide and AutoCAD Map 3D. SDF format version SDF3 (based on SQLite3 ...
Autodesk Autodesk, Inc. is an American multinational software corporation that makes software products and services for the architecture, engineering, construction, manufacturing, media, education, and entertainment industries. Autodesk is headquartered ...
's high-performance geodatabase format, native to
MapGuide MapGuide Open Source is a web-based Cartography, map-making platform that enables users to quickly develop and deploy web mapping applications and geospatial web services. The application was introduced as open-source software, open-source by Auto ...
* ISFC –
Intergraph Intergraph Corporation was an American software development and services company, which now forms part of Hexagon AB. It provides enterprise engineering and geospatially powered software to businesses, governments, and organizations around the w ...
's
MicroStation MicroStation is a CAD software platform for two- and three-dimensional design and drafting, developed and sold by Bentley Systems and used in the architectural and engineering industries. It generates 2D/3D vector graphics objects and elements and ...
based CAD solution attaching vector elements to a relational
Microsoft Access Microsoft Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools (not to be confused with the old Microsoft Access w ...
database *
Dual Independent Map Encoding Dual Independent Map Encoding (DIME) is an encoding scheme developed by the US Bureau of the Census for efficiently storing geographical data. The committee behind the case study that eventually resulted in DIME was established in 1965, although the ...
(DIME) – A historic GIS file format, developed in the 1960s


Advantages and disadvantages

There are some important advantages and disadvantages to using a raster or vector data model to represent reality: * Raster datasets record a value for all points in the area covered which may require more storage space than representing data in a vector format that can store data only where needed. * Raster data is computationally less expensive to render than vector graphics * Combining values and writing custom formulas for combining values from different layers are much easier using raster data. * There are transparency and aliasing problems when overlaying multiple stacked pieces of raster images. * Vector data allows for visually smooth and easy implementation of overlay operations, especially in terms of graphics and shape-driven information like maps, routes and custom fonts, which are more difficult with raster data. * Vector data can be displayed as
vector graphics Vector graphics is a form of computer graphics in which visual images are created directly from geometric shapes defined on a Cartesian plane, such as points, lines, curves and polygons. The associated mechanisms may include vector display a ...
used on traditional maps, whereas raster data will appear as an
image An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
that may have a blocky appearance for object boundaries. (depending on the resolution of the raster file). * Vector data can be easier to register, scale, and re-project, which can simplify combining vector layers from different sources. * Vector data is more compatible with relational database environments, where they can be part of a relational table as a normal column and processed using a multitude of operators. * Vector file sizes are usually smaller than raster data, which can be tens, hundreds or more times larger than vector data (depending on resolution). * Vector data is simpler to update and maintain, whereas a raster image will have to be completely reproduced. (Example: a new road is added). * Vector data allows much more analysis capability, especially for "networks" such as roads, power, rail, telecommunications, etc. (Examples: Best route, largest port, airfields connected to two-lane highways). Raster data will not have all the characteristics of the features it displays.


Integrated file formats

Modern
object–relational database An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritan ...
s can now store a variety of complex data using the
binary large object A binary large object (BLOB or blob) is a collection of binary data stored as a single entity. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob. They can exist as persistent ...
datatype, including both raster grids and vector geometries. This enables some
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
systems to store data of both models in the same database. *
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
File
Geodatabase A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
- A proprietary format for storing "feature" (vector) and raster data locally *
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
Enterprise
Geodatabase A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
- A proprietary model for storing a geodatabase structure in a variety of commercial and open-source
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s * GeoPackage (GPKG) – A standards-based, open format based on the SQLite database format for both vector and raster data, adopted by the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...


See also

*
Datum (geodesy) A geodetic datum or geodetic system (also: geodetic reference datum, geodetic reference system, or geodetic reference frame) is a global datum reference or reference frame for precisely representing the position of locations on Earth or other plan ...
*
GDAL/OGR The Geospatial Data Abstraction Library (GDAL) is a computer software library for reading and writing raster and vector geospatial data formats (e.g. shapefile), and is released under the permissive X/MIT style free software license by the Ope ...
, a library for reading and writing many formats * Feature Manipulation Engine (FME), a commercial program for converting data between a large number of formats


References

{{Markup languages