A very large database, (originally written very large data base) or VLDB,
is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies.
Definition
The vague adjectives of ''very'' and ''large'' allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a
canonical form
In mathematics and computer science, a canonical, normal, or standard form of a mathematical object is a standard way of presenting that object as a mathematical expression. Often, it is one which provides the simplest representation of an ob ...
via
database normalization
Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrit ...
or the time for a full database operation like a
backup. Technology improvements have continually changed what is considered ''very large''.
One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet".
Sizes of a VLDB database
There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data.
That said, VLDB issues may start to appear when 1 TB is approached,
and are more than likely to have appeared as 30 TB or so is exceeded.
VLDB challenges
Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources.
Configuration
Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases.
Administration
The complexities of managing a VLDB can increase exponentially for the
database administrator
Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration
Configuration or configurations may refer to:
Computing
* Computer configuration or system c ...
as database size increases.
Availability and maintenance
When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database.
In particular it typically infeasible to meet a typical
recovery time objective
Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle.It employs policies, tools, and procedures. Disaster recovery focuses on t ...
(RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives.
To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO).
For many systems only geographically remote solutions may be acceptable.
Backup and recovery
Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution.
Performance
Given the same infrastructure there may typically be a decrease in performance, that is increase in
response time
Response time may refer to:
*The time lag between an electronic input and the output signal which depends upon the value of passive components used.
*Responsiveness, how quickly an interactive system responds to user input
*Response time (biology) ...
as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (
linear time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (
sub-linear time).
Other effects can be
caching becoming less efficient because proportionally less data can be cached and while some
indexes
Index (or its plural form indices) may refer to:
Arts, entertainment, and media Fictional entities
* Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index''
* The Index, an item on a Halo megastru ...
such as the
B+ automatically sustain well with growth others such as a
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
may need to be rebuilt.
Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of
contention will increase. Some solutions to regaining performance include
partitioning,
clustering, possibly with
sharding
A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.
Some data within a database remains present in all shards, but som ...
, or use of a
database machine A database machines or back end processor is a computer or special hardware that stores and retrieves data from a database. It is specially designed for database access and is tightly coupled to the main ( front-end) computer(s) by a high-speed ch ...
.
Partitioning
Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery.,
bulk movements due to
information lifecycle management (ILM),
reducing contention
as well as allowing optimization of some query processing.
Storage
In order to satisfy needs of a VLDB the database
storage needs to have low access
latency and
contention, high
throughput
Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
, and
high availability
High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Modernization has resulted in an increased reliance on these systems. F ...
.
Server resources
The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve.
Relationship to big data
VLDB is not the same as ''
big data'', however the storage aspect of ''big data'' may involve a VLDB database.
That said some of the storage solutions supporting ''big data'' were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional
RDBMS
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
's might encounter.
See also
*
XLDB
References
{{DEFAULTSORT:Very Large Database
Data management
Types of databases