In computing, a comma-separated values (CSV) file is a delimited text file that uses a comma to separate values (many implementations of CSV import/export tools allow other separators to be used). It stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. The CSV file format is not standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line-breaks. CSV implementations may not handle such field data, or they may use quotation marks to surround the field. Quotation does not solve everything: some fields may need embedded quotation marks, so a CSV implementation may include escape characters or escape sequences. In addition, the term "CSV" also denotes some closely related delimiter-separated formats that use different field delimiters. These include tab-separated values and space-separated values. A delimiter that is not present in the field data (such as tab) keeps the format parsing simple. These alternate delimiter-separated files are often even given a .csv extension despite the use of a non-comma field separator. This loose terminology can cause problems in data exchange. Many applications that accept CSV files have options to select the delimiter character and the quotation character.
1 Data exchange 2 Specification 3 History 4 General functionality 5 Standardization 6 Basic rules 7 Example 8 Application support 9 See also 10 References 11 Further reading
Data exchange CSV is a common data exchange format that is widely supported by consumer, business, and scientific applications. Among its most common uses is moving tabular data between programs that natively operate on incompatible (often proprietary or undocumented) formats. This works despite lack of adherence to RFC 4180 (or any other standard), because so many programs support variations on the CSV format for data import. For example, a user may need to transfer information from a database program that stores data in a proprietary format, to a spreadsheet that uses a completely different format. The database program most likely can export its data as "CSV"; the exported CSV file can then be imported by the spreadsheet program. Specification RFC 4180 proposes a specification for the CSV format, and this is the definition commonly used. However, in popular usage "CSV" is not a single, well-defined format. As a result, in practice the term "CSV" might refer to any file that:
is plain text using a character set such as ASCII, various Unicode character sets (e.g. UTF-8), EBCDIC, or Shift JIS, consists of records (typically one record per line), with the records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces), where every record has the same sequence of fields.
Within these general constraints, many variations are in use.
Therefore, without additional information (such as whether RFC 4180 is
honored), a file claimed simply to be in "CSV" format is not fully
specified. As a result, many applications supporting CSV files allow
users to preview the first few lines of the file and then specify the
delimiter character(s), quoting rules, etc. If a particular CSV file's
variations fall outside what a particular receiving program supports,
it is often feasible to examine and edit the file by hand (i.e., with
a text editor) or write a script or program to produce a conforming
Comma-separated values is a data format that pre-dates personal
computers by more than a decade: the
MS-DOS-style lines that end with (CR/LF) characters (optional for the last line). An optional header record (there is no sure way to detect whether it is present, so care is required when importing). Each record "should" contain the same number of comma-separated fields. Any field may be quoted (with double quotes). Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly). A (double) quote character in a field must be represented by two (double) quote characters.
The format can be processed by most programs that claim to read CSV
files. The exceptions are: (a) programs may not support line-breaks
within quoted fields, (b) programs may confuse the optional header
with data or interpret the first data line as an optional header and
(c) double quotes in a field may not be parsed correctly
Open Knowledge and various partners created a data protocols
working group, which later evolved into the Frictionless Data
initiative. One of the main formats they released was
CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines. A CSV file does not require a specific character encoding, byte order, or line terminator format (some software does not support all line-end variations). A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines. All records should have the same number of fields, in the same order. Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes (see RFC 2046, section 4.1). For example, the numeric quantity 65535 may be represented as the 5 ASCII characters "65535" (or perhaps other forms such as "0xFFFF", "000065535.000E+00", etc.); but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters (e.g. the numbers 11264-11307 have a comma as their high order byte: ord(',')*256..ord(',')*257-1). If this "plain text" convention is not followed, then the CSV file no longer contains sufficient information to interpret it correctly, the CSV file will not likely survive transmission across differing computer architectures, and will not conform to the text/csv MIME type. Adjacent fields must be separated by a single comma. However, "CSV" formats vary greatly in this choice of separator character. In particular, in locales where the comma is used as a decimal separator, semicolon, TAB, or other characters are used instead.
Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following rules.
Fields with embedded commas or double-quote characters must be quoted.
1997,Ford,E350,"Super, luxurious truck"
Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"
Fields with embedded line breaks must be quoted (however, many CSV implementations do not support embedded line breaks).
1997,Ford,E350,"Go get one now they are going fast"
In some CSV implementations[which?], leading and trailing spaces and tabs are trimmed (ignored). Such trimming is forbidden by RFC 4180, which states "Spaces are considered part of a field and should not be ignored."
1997, Ford, E350 not same as 1997,Ford,E350
According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementors should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793 ) when processing CSV files."
1997, "Ford" ,E350
In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
1997,Ford,E350," Super luxurious truck "
Los Angeles,34°03′N,118°15′W New York City,40°42′46″N,74°00′21″W Paris,48°51′24″N,2°21′03″E
The first record may be a "header", which contains column names in each of the fields (there is no reliable way to tell whether a file does this or not; however, it is uncommon to use characters other than letters, digits, and underscores in such column names).
Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar
Year Make Model Description Price
1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition"
1999 Chevy Venture "Extended Edition, Very Large"
1996 Jeep Grand Cherokee MUST SELL! air, moon roof, loaded 4799.00
The above table of data may be represented in CSV format as follows:
Year,Make,Model,Description,Price 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00
Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma):
Year,Make,Model,Length 1997,Ford,E350,2.34 2000,Mercury,Cougar,2.38
Example of an analogous European CSV/DSV file (where the decimal separator is a comma and the value separator is a semicolon):
Year;Make;Model;Length 1997;Ford;E350;2,34 2000;Mercury;Cougar;2,38
The latter format is not RFC 4180 compliant. Compliance could be
achieved by the use of a comma instead of a semicolon as a separator
and either the international notation for the representation of the
decimal mark or the practice of quoting all numbers that have a
Main article: CSV application support
The CSV file format is supported by almost all spreadsheets and
database management systems, including Microsoft Excel, Apple Numbers,
LibreOffice Calc, and
cut (-d to change the delimiter character) paste (-d to change the delimiter character(s)) join (-t to change the delimiter character) sort (-t to change the delimiter character) uniq (-f to skip comparing the first N fields) emacs (using csv-nav mode) awk (-F to change the delimiter character)
Comparison of data serialization formats
Introduction to CSV files
^ a b c d Shafranovich, Y. (October 2005). Common Format and MIME Type
for CSV Files. IETF. p. 1. doi:10.17487/RFC4180. RFC 4180.
^ a b Shafranovich (2005) states, "This RFC documents the format of
comma separated values (CSV) files and formally registers the
MIME type for CSV in accordance with RFC 2048".
^ "CSV -
Comma Separated Values". Retrieved 2017-12-02.
^ "CSV Files". Retrieved June 4, 2014.
Comma Separated Values (CSV) Standard