A tab-separated values (TSV) file is a simple text format for storing data in a
tabular
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
structure, e.g., a
database table
A table is a collection of related data held in a table format within a database. It consists of columns and rows.
In relational databases, and flat file databases, a ''table'' is a set of data elements (values) using a model of vertical column ...
or
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
data, and a way of exchanging information between
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s.
Each
record in the table is one line of the
text file
A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating ...
. Each field value of a record is separated from the next by a
tab character
The tab key (abbreviation of tabulator key or tabular key) on a keyboard is used to advance the cursor to the next tab stop.
History
The word ''tab'' derives from the word ''tabulate'', which means "to arrange data in a tabular, or table, fo ...
. The TSV format is thus a variation of the
comma-separated values
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separat ...
format.
TSV is a simple file format that is widely supported, so it is often used in
data exchange
Data exchange is the process of taking data structured under a ''source'' schema and transforming it into a ''target'' schema, so that the target data is an accurate representation of the source data.A. Doan, A. Halevy, and Z. Ives.Principles of da ...
to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database program to a spreadsheet.
The IANA standard for TSV
achieves simplicity by simply disallowing tabs within fields.
Example
The head of the
Iris flower data set
The ''Iris'' flower data set or Fisher's ''Iris'' data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper ''The use of multiple measurements in taxonomic problems'' as an ...
can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):
Sepal length	Sepal width	Petal length	Petal width	Species
5.1	3.5	1.4	0.2	I. setosa
4.9	3.0	1.4	0.2	I. setosa
4.7	3.2	1.3	0.2	I. setosa
4.6	3.1	1.5	0.2	I. setosa
5.0	3.6	1.4	0.2	I. setosa
The TSV plain text above corresponds to the following tabular data:
Conventions for lossless conversion to TSV
Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following
escapes
Escape or Escaping may refer to:
Computing
* Escape character, in computing and telecommunication, a character which signifies that what follows takes an alternative interpretation
** Escape sequence, a series of characters used to trigger some so ...
:
\n for newline,
\t for tab,
\r for carriage return,
\\ for backslash.
Another common convention is to use the CSV convention from {{IETF RFC, 4180 and enclose these special characters in double quotes. This can lead to ambiguities.
Another ambiguity is whether records are separated by newlines, as would be typical for lines on UNIX, or a carriage return followed by a newline, as would be typical for Microsoft platforms. Many programs such as LibreOffice expect a carriage return followed by a newline.
See also
*
Comma-separated values
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separat ...
*
Delimiter collision
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
References
Bibliography
*
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
, Text Media Types
Definition of tab-separated-values (tsv) Paul Lindner, U of MN Internet Gopher Team, June 1993
Jukka Korpela, created 2000-09-01, last update 2005-02-12.
External links
Gnumeric
Gnumeric is a spreadsheet program that is part of the GNOME Free Software Desktop Project. Gnumeric version 1.0 was released on 31 December 2001. Gnumeric is distributed as free software under the GNU General Public License; it is intended to r ...
manual
Spreadsheet file formats
Delimiter-separated format