Schema migration
   HOME

TheInfoList



OR:

In
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
, a schema migration (also database migration, database change management) refers to the management of version-controlled, incremental and reversible changes to relational
database schema The database schema is the structure of a database described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divi ...
s. A schema migration is performed on a database whenever it is necessary to update or revert that database's schema to some newer or older version. Migrations are performed programmatically by using a ''schema migration tool''. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state. Most schema migration tools aim to minimize the impact of schema changes on any existing data in the database. Despite this,
preservation Preservation may refer to: Heritage and conservation * Preservation (library and archival science), activities aimed at prolonging the life of a record while making as few changes as possible * ''Preservation'' (magazine), published by the Nat ...
of data in general is not guaranteed because schema changes such as the deletion of a database column can destroy data (i.e. all values stored under that column for all rows in that table are deleted). Instead, the tools help to preserve the meaning of the data or to reorganize existing data to meet new requirements. Since meaning of the data often cannot be encoded, the configuration of the tools usually needs manual intervention.


Risks and benefits

Schema migration allows for fixing mistakes and adapting the data as requirements change. They are an essential part of software evolution, especially in agile environments (see below). Applying a schema migration to a production database is always a risk. Development and test databases tend to be smaller and cleaner. The data in them is better understood or, if everything else fails, the amount of data is small enough for a human to process. Production databases are usually huge, old and full of surprises. The surprises can come from many sources: * Corrupt data that was written by old versions of the software and not cleaned properly * Implied dependencies in the data which no one knows about anymore * People directly changing the database without using the designated tools * Bugs in the schema migration tools * Mistakes in assumptions how data should be migrated For these reasons, the migration process needs a high level of discipline, thorough testing and a sound backup strategy. Schema migrations may take a long time to complete and for systems that operate 24/7 it is important to be able to d
database migrations without downtime
Usually it is done with the help of feature flags and
continuous delivery Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time and, following a pipeline through a "production-like environment", witho ...
.


Schema migration in agile software development

When developing
software applications Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
backed by a database, developers typically develop the application
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
in tandem with an evolving database schema. The code typically has rigid expectations of what columns, tables and constraints are present in the database schema whenever it needs to interact with one, so only the version of database schema against which the code was developed is considered fully compatible with that version of source code. In
software testing Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...
, while developers may mock the presence of a compatible database system for
unit testing In computer programming, unit testing is a software testing method by which individual units of source code—sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures&md ...
, any level of testing higher than this (e.g.
integration testing Integration testing (sometimes called integration and testing, abbreviated I&T) is the phase in software testing in which individual software modules are combined and tested as a group. Integration testing is conducted to evaluate the compliance ...
or
system testing System testing is testing conducted on a complete integrated system to evaluate the system's compliance with its specified requirements. System testing takes, as its input, all of the integrated components that have passed integration testing. ...
) it is common for developers to test their application against a local or remote test database schematically compatible with the version of source code under test. In advanced applications, the migration itself can be subject to migration testing. With schema migration technology, data models no longer need to be fully designed up-front, and are more capable of being adapted with changing project requirements throughout the software development lifecycle.


Relation to revision control systems

Teams of software developers usually use
version control systems In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
to manage and collaborate on changes made to versions of source code. Different developers can develop on divergent, relatively older or newer
branches A branch, sometimes called a ramus in botany, is a woody structural member connected to the central trunk of a tree (or sometimes a shrub). Large branches are known as boughs and small branches are known as twigs. The term ''twig'' usually r ...
of the same source code to make changes and additions during development. Supposing that the software under development interacts with a database, every version of the source code can be associated with at least one database schema with which it is compatible. Under good
software testing Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...
practice, schema migrations can be performed on test databases to ensure that their schema is compatible to the source code. To streamline this process, a schema migration tool is usually invoked as a part of an automated
software build In software development, a build is the process of converting source code files into standalone software artifact(s) that can be run on a computer, or the result of doing so. Functions Building software is an end-to-end process that involves ...
as a prerequisite of the automated testing phase. Schema migration tools can be said to solve versioning problems for database schemas just as version control systems solve versioning problems for source code. In practice, many schema migration tools actually rely on a textual representation of schema changes (such as files containing SQL statements) such that the version history of schema changes can effectively be stored alongside program source code within VCS. This approach ensures that the information necessary to recover a compatible database schema for a particular code branch is recoverable from the source tree itself. Another benefit of this approach is the handling of concurrent conflicting schema changes; developers may simply use their usual text-based conflict resolution tools to reconcile differences.


Relation to schema evolution

Schema migration tooling could be seen as a facility to track the history of an evolving schema.


Advantages

Developers no longer need to remove the entire test database in order to create a new test database from scratch (e.g. using schema creation scripts from DDL generation tools). Further, if generation of test data costs a lot of time, developers can avoid regenerating test data for small, non-destructive changes to the schema.


References


Links


Martin Fowler: Evolutionary Database Design
{{Database Databases Software maintenance Agile software development