ESRI Shape
   HOME

TheInfoList



OR:

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
features:
points Point or points may refer to: Places * Point, Lewis, a peninsula in the Outer Hebrides, Scotland * Point, Texas, a city in Rains County, Texas, United States * Point, the NE tip and a ferry terminal of Lismore, Inner Hebrides, Scotland * Point ...
,
lines Line most often refers to: * Line (geometry), object with zero thickness and curvature that stretches to infinity * Telephone line, a single-user circuit on a telephone communication system Line, lines, The Line, or LINE may also refer to: Arts ...
, and
polygons In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two toge ...
, representing, for example,
water well A well is an excavation or structure created in the ground by digging, driving, or drilling to access liquid resources, usually water. The oldest and most common kind of well is a water well, to access groundwater in underground aquifers. Th ...
s,
river A river is a natural flowing watercourse, usually freshwater, flowing towards an ocean, sea, lake or another river. In some cases, a river flows into the ground and becomes dry at the end of its course without reaching another body of wate ...
s, and
lake A lake is an area filled with water, localized in a basin, surrounded by land, and distinct from any river or other outlet that serves to feed or drain the lake. Lakes lie on land and are not part of the ocean, although, like the much large ...
s. Each item usually has
attribute Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
s that describe it, such as ''name'' or ''temperature''.


Overview

The shapefile format is a digital vector storage format for storing geographic location and associated attribute information. This format lacks the capacity to store
topological In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing h ...
information. The shapefile format was introduced with ArcView GIS version 2 in the early 1990s. It is now possible to read and write geographical datasets using the shapefile format with a wide variety of software. The shapefile format stores the geometry as primitive geometric shapes like points, lines, and polygons. These shapes, together with data attributes that are linked to each shape, create the representation of the geographic data. The term "shapefile" is quite common, but the format consists of a collection of files with a common filename prefix, stored in the same
directory Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's u ...
. The three ''mandatory'' files have
filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
s , , and
.dbf The .dbf file extension represents the dBase database file. The file type was introduced in 1983 with dBASE II. The file structure has evolved to include many features and capabilities. Several additional file types have been added, to support ...
. The actual ''shapefile'' relates specifically to the file, but alone is incomplete for distribution as the other supporting files are required. Legacy GIS software may expect that the filename prefix be limited to eight characters to conform to the DOS
8.3 filename An 8.3 filename (also called a short filename or SFN) is a filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alterna ...
convention, though modern software applications accept files with longer names. ;Mandatory files : * — shape format; the feature geometry itself * — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly * — attribute format; columnar attributes for each shape, in
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
IV format ;Other files : * — projection description, using a
well-known text representation of coordinate reference systems Well-known text representation of coordinate reference systems (WKT or WKT-CRS) is a text markup language for representing spatial reference systems and transformations between spatial reference systems. The formats were originally defined by the ...
* and — a
spatial index A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
of the features * and — a spatial index of the features that are read-only * and — an attribute index of the active fields in a table * — a geocoding index for read-write datasets * — a geocoding index for read-write datasets (ODB format) * — an attribute index for the file in the form of ''shapefile''.''columnname''.atx (ArcGIS 8 and later) * —
geospatial metadata Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, ...
in XML format, such as
ISO 19115 Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, ...
or other
XML schema An XML schema is a description of a type of Extensible Markup Language, XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed ...
* — used to specify the
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...
(only for ) for identifying the
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
to be used * — an alternative
quadtree A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four q ...
spatial index used by
MapServer MapServer is an open-source development environment for building spatially enabled internet applications, built in the C language, and is widely known as one of the fastest Web mapping engines available. It can run as a CGI program or via MapSc ...
and
GDAL/OGR The Geospatial Data Abstraction Library (GDAL) is a computer software library for reading and writing raster and vector geospatial data formats (e.g. shapefile), and is released under the permissive X/MIT style free software license by the Ope ...
software In each of the , , and files, the shapes in each file correspond to each other in sequence (i.e., the first record in the file corresponds to the first record in the and files, etc.). The and files have various fields with different
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
, so an implementer of the file formats must be very careful to respect the endianness of each field and treat it properly.


Shapefile shape format ()

The main file () contains the geometry data. Geometry of a given feature is stored as a set of vector coordinates. The
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
consists of a single fixed-length header followed by one or more variable-length
records A record, recording or records may refer to: An item or collection of data Computing * Record (computer science), a data structure ** Record, or row (database), a set of fields in a database related to one entity ** Boot sector or boot record, ...
. Each of the variable-length records includes a record-header component and a record-contents component. A detailed description of the file format is given in the ''ESRI Shapefile Technical Description''. This format should not be confused with the
AutoCAD AutoCAD is a commercial computer-aided design (CAD) and drafting software application. Developed and marketed by Autodesk, AutoCAD was first released in December 1982 as a desktop app running on microcomputers with internal graphics controllers. ...
shape font source format, which shares the extension. The 2D axis ordering of coordinate data assumes a
Cartesian coordinate system A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
, using the order (X Y) or (Easting Northing). This axis order is consistent for
Geographic coordinate system The geographic coordinate system (GCS) is a spherical or ellipsoidal coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the various ...
s, where the order is similarly (longitude latitude). Geometries may also support 3- or 4-
dimension In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...
al Z and M coordinates, for
elevation The elevation of a geographic location is its height above or below a fixed reference point, most commonly a reference geoid, a mathematical model of the Earth's sea level as an equipotential gravitational surface (see Geodetic datum § Vert ...
and measure, respectively. A Z-dimension stores the elevation of each coordinate in
3D space Three-dimensional space (also: 3D space, 3-space or, rarely, tri-dimensional space) is a geometric setting in which three values (called ''parameters'') are required to determine the position of an element (i.e., point). This is the informal ...
, which can be used for analysis or for visualisation of geometries using
3D computer graphics 3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for th ...
. The user-defined M dimension can be used for one of many functions, such as storing
linear referencing Linear referencing, also called linear reference system or linear referencing system (LRS), is a method of spatial referencing in engineering and construction, in which the locations of physical features along a linear element are described in te ...
measures or relative
time Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to ...
of a feature in 4D space. The main file header is fixed at 100 bytes in length and contains 17 fields; nine 4-byte (32-bit signed integer or int32) integer fields followed by eight 8-byte (
double A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * Th ...
) signed floating point fields: The file then contains any number of variable-length records. Each record is prefixed with a record header of 8 bytes: Following the record header is the actual record: The variable-length record contents depend on the shape type, which must be either the shape type given in the file header or Null. The following are the possible shape types:


Shapefile shape index format ()

The index contains positional index of the feature geometry and the same 100-byte header as the file, followed by any number of 8-byte fixed-length records which consist of the following two fields: Using this index, it is possible to seek backwards in the shapefile by, first, seeking backwards in the shape index (which is possible because it uses fixed-length records), then reading the record offset, and using that offset to seek to the correct position in the file. It is also possible to seek forwards an arbitrary number of records using the same method. It is possible to generate the complete index file given a lone file. However, since a shapefile is supposed to always contain an index, doing so counts as repairing a corrupt file.


Shapefile attribute format ()

This file stores the attributes for each shape; it uses the
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
IV format. The format is public knowledge, and has been implemented in many dBase clones known as
xBase xBase is the generic term for all programming languages that derive from the original dBASE ( Ashton-Tate) programming language and database formats. These are sometimes informally known as dBASE "clones". While there was a non-commercial predec ...
. The open-source shapefile C library, for example, calls its format "xBase" even though it's plain dBase IV. The names and values of attributes are not standardized, and will be different depending on the source of the shapefile.


Shapefile spatial index format ()

This is a binary
spatial index A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
file, which is used only by Esri software. The format is not documented by Esri. However it has been reverse-engineered and documented by the open source community. The 100-byte header is similar to the one in . It is not currently implemented by other vendors. The file is not strictly necessary, since the file contains all of the information necessary to successfully parse the spatial data.


Limitations


Topology and the shapefile format

The shapefile format does not have the ability to store
topological In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing h ...
information. The ESRI ArcInfo coverages and personal/file/enterprise
geodatabase A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s do have the ability to store feature topology.


Spatial representation

The edges of a
polyline In geometry, a polygonal chain is a connected series of line segments. More formally, a polygonal chain is a curve specified by a sequence of points (A_1, A_2, \dots, A_n) called its vertices. The curve itself consists of the line segments co ...
or
polygon In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two toge ...
are composed of points. The spacing of the points implicitly determines the scale at which the feature is useful visually. Exceeding that scale results in jagged representation. Additional points would be required to achieve smooth shapes at greater scales. For features better represented by smooth curves, the polygon representation requires much more data storage than, for example, splines, which can capture smoothly varying shapes efficiently. None of the shapefile format types supports splines.


Data storage

The size of both and component files cannot exceed 2 GB (or 231 bytes) — around 70 million point features at best. The maximum number of feature for other geometry types varies depending on the number of vertices used. The attribute database format for the component file is based on an older
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
standard. This database format inherently has a number of limitations: *While the current
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
standard, and
GDAL/OGR The Geospatial Data Abstraction Library (GDAL) is a computer software library for reading and writing raster and vector geospatial data formats (e.g. shapefile), and is released under the permissive X/MIT style free software license by the Ope ...
(the main open source software library for reading and writing shapefile format datasets) support
null Null may refer to: Science, technology, and mathematics Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value *Null character, the zero-valued ASCII character, also designated by , often used ...
values, ESRI software represents these values as zeros — a very serious issue for analyzing quantitative data, as it may skew representation and statistics if null quantities are represented as zero *Poor support for
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
field names or field storage *Maximum length of field names is 10 characters *Maximum number of fields is 255 *Supported field types are: floating point (13 character storage), integer (4 or 9 character storage), date (no time storage; 8 character storage), and text (maximum 254 character storage) *Floating point numbers may contain rounding errors since they are stored as text


Mixing shape types

Because the shape type precedes each geometry record, a shapefile is technically capable of storing a mixture of different shape types. However, the specification states, "All the non-Null shapes in a shapefile are required to be of the same shape type." Therefore, this ability to mix shape types must be limited to interspersing null shapes with the single shape type declared in the file's header. A shapefile must not contain both polyline and polygon data, for example, the descriptions for a well (point), a river (polyline), and a lake (polygon) would be stored in three separate datasets.


See also

*
Geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
*
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
*
Open Source Geospatial Foundation The Open Source Geospatial Foundation (OSGeo), is a non-profit non-governmental organization whose mission is to support and promote the collaborative development of open geospatial technologies and data. The foundation was formed in February 200 ...
(OSGeo) *
List of geographic information systems software A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which lo ...
*
Comparison of geographic information systems software This is a comparison of notable geographic information system, GIS software. To be included on this list, the software must have a linked existing article. License, source, & operating system support Pure server Map servers Map caches Pure ...


External links


Shapefile file extensions
– Esri Webhelp docs for ArcGIS 10.0 (2010)


shapelib.maptools.org
– Free c library for reading/writing shapefiles
Python Shapefile Library
– Open Source (MIT License) Python library for reading/writing shapefiles
Shapefile Projection Finder - Detect unknown projection of a shapefile automatically
* Jav
Shapefile
an
Dbase
Libraries – Open Source (Apache License) Java libraries for reading/writing shapefiles and the associated dBase files (libraries are part of th

but could be used independently)


References

{{Reflist Open formats GIS vector file formats