A flat-file database is a
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
file (e.g.
csv,
txt or
tsv), or a
binary file
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit.
The term has generally implied a small database, but very large databases can also be flat.
Overview
Plain text files usually contain one
record per line. There are different conventions for depicting data. In
comma-separated values
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separat ...
and
delimiter-separated values
Formats that use delimiter-separated values (also DSV)DSV stands for ''Delimiter Separated Values'' store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. Most database and spreadsheet program ...
files,
field
Field may refer to:
Expanses of open ground
* Field (agriculture), an area of land used for agricultural purposes
* Airfield, an aerodrome that lacks the infrastructure of an airport
* Battlefield
* Lawn, an area of mowed grass
* Meadow, a grass ...
s can be separated by
delimiters
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
such as
comma
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
or
tab characters. In other cases, each field may have a fixed length; short values may be padded with
space character
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s. Extra formatting may be needed to avoid
delimiter collision
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as ...
.
Using delimiters incurs some
overhead in locating them every time they are processed (unlike fixed-width formatting), which may have
performance
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Management science
In the work place ...
implications. However, use of character delimiters (especially commas) is also a crude form of
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
which may assist overall performance by reducing data volumes — especially for
data transmission
Data transmission and data reception or, more broadly, data communication or digital communications is the transfer and reception of data in the form of a digital bitstream or a digitized analog signal transmitted over a point-to-point o ...
purposes. Use of character delimiters which include a length component (
Declarative notation) is comparatively rare but vastly reduces the overhead associated with locating the extent of each field.
Examples of flat files include
/etc/passwd
and
/etc/group
on
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems. Another example of a flat file is a name-and-address list with the fields ''Name'', ''Address'', and ''Phone Number''.
A list of names, addresses, and phone numbers written by hand on a sheet of paper is a flat-file database. This can also be done with any
typewriter
A typewriter is a mechanical or electromechanical machine for typing characters. Typically, a typewriter has an array of keys, and each one causes a different single character to be produced on paper by striking an inked ribbon selectivel ...
or
word processor
A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices ded ...
. A
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
or
text editor
A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be us ...
program may be used to implement a flat-file database, which may then be printed or used
online
In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" or ...
for improved search capabilities.
History
Herman Hollerith
Herman Hollerith (February 29, 1860 – November 17, 1929) was a German-American statistician, inventor, and businessman who developed an electromechanical tabulating machine for punched cards to assist in summarizing information and, later, i ...
's work for the
US Census Bureau
The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureau is part of the ...
first exercised in the
1890 United States Census, involving data tabulated via hole punches in paper cards,
is sometimes considered the first computerized flat-file database, as it included no cards indexing other cards, or otherwise relating the individual cards to one another, save by their group membership.
In the 1980s, configurable flat-file database
computer application
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These pr ...
s were popular on the
IBM PC
The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the IBM PC model line and the basis for the IBM PC compatible de facto standard. Released on August 12, 1981, it was created by a team ...
and the
Macintosh
The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
. These programs were designed to make it easy for individuals to design and use their own databases, and were almost on par with
word processors
A word processor is an electronic device (later a computer software application) for text, composing, editing, formatting, and printing.
The word processor was a stand-alone office machine in the 1960s, combining the keyboard text-entry and prin ...
and
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
s in popularity. Examples of flat-file database software include early versions of
FileMaker and the
shareware
Shareware is a type of proprietary software that is initially shared by the owner for trial use at little or no cost. Often the software has limited functionality or incomplete documentation until the user sends payment to the software developer ...
PC-File
PC-File was a flat file database computer application most often run on DOS. It was one of the first of three widely popular software products sold via the marketing method that became known as shareware. It was originally written by Jim "Button ...
and the popular
dBase.
Flat-file databases are common and ubiquitous because they are easy to write and edit, and suit myriad purposes in an uncomplicated way.
Modern implementations
Linear stores of NoSQL data,
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
formatted data, primitive spreadsheets (perhaps comma-separated or tab-delimited), and text files can all be seen as flat-file databases, because they lack integrated indexes, built-in references between data elements, or complex data types. Programs to manage collections of books or appointments and
address book
An address book or a name and address book is a book, or a database used for storing entries called contacts. Each contact entry usually consists of a few standard fields (for example: first name, last name, company name, address, telephone n ...
may use essentially single-purpose flat-file databases, storing and retrieving information from flat files unadorned with indexes or pointing systems.
While a user can write a table of contents into a text file, the text file format itself does not include a concept of a table of contents. While a user may write "friends with Kathy" in the "Notes" section for John's contact information, this is interpreted by the user rather than a built-in feature of the database. When a database system begins to recognize and codify relationships between records, it begins to drift away from being "flat," and when it has a detailed system for describing types and hierarchical relationships, it is now too structured to be considered "flat."
Example database
The following example illustrates typical elements of a flat-file database. The
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
arrangement consists of a series of columns and rows organized into a
tabular format. This specific example uses only one table.
The columns include: ''name'' (a person's name, second column); ''team'' (the name of an athletic team supported by the person, third column); and a numeric ''unique ID'', (used to uniquely identify records, first column).
Here is an example textual representation of the described data:
id name team
1 Amy Blues
2 Bob Reds
3 Chuck Blues
4 Richard Blues
5 Ethel Reds
6 Fred Blues
7 Gilly Blues
8 Hank Reds
9 Hank Blues
This type of data representation is quite standard for a flat-file database, although there are some additional considerations that are not readily apparent from the text:
* Data types: each column in a database table such as the one above is ordinarily restricted to a specific
data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
. Such restrictions are usually established by convention, but not formally indicated unless the data is transferred to a
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
system.
* Separated columns: In the above example, individual columns are separated using
whitespace characters. This is also called indentation or "fixed-width" data formatting. Another common convention is to separate columns using one or more
delimiter characters, such as a
tab or comma.
* Relational algebra: Each row or record in the above table meets the standard definition of a
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
under
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.
The main application of relational algebr ...
(the above example depicts a series of 3-tuples). Additionally, the first row specifies the
field names that are associated with the values of each row.
* Database management system: Since the formal operations possible with a text file are usually more limited than desired, the text in the above example would ordinarily represent an intermediary state of the data prior to being transferred into a
database management system.
See also
*
/etc/passwd, a commonly used flat file, used to detail users in Unix
*
CSV (standard Comma-Separated Values)
*
Berkeley DB
Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitr ...
(typical flat-file database)
*
Awk
AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
The AWK lang ...
(classical flat-file processor)
*
Recfiles
recfiles is a file format for human-editable, plain text databases.
Databases using this file format can be edited using any text editor. recfiles allow for basic relational database operations, typing, auto-incrementing, as well as a simple join ...
(plain text database file format)
References
{{DEFAULTSORT:Flat File Database
Data management
Computer file formats
Database models
it:Flat file