Data migration is the process of selecting, preparing, extracting, and transforming
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
and permanently
transferring it from one
computer storage
Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processing unit (CPU) of a compute ...
system to another. Additionally, the validation of migrated data for completeness and the decommissioning of legacy data storage are considered part of the entire data migration process.
Data migration is a key consideration for any system implementation, upgrade, or consolidation, and it is typically performed in such a way as to be as automated as possible, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including server or storage equipment replacements, maintenance or upgrades,
application migration, website consolidation, disaster recovery, and
data center
A data center (American English) or data centre (British English)See spelling differences. is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunic ...
relocation.
The standard phases
, "nearly 40 percent of data migration projects were over time, over budget, or failed entirely."
As such, to achieve an effective data migration, proper planning is critical. While the specifics of a data migration plan may vary—sometimes significantly—from project to project, the computing company
IBM suggests there are three main phases to most any data migration project: planning, migration, and post-migration.
Each of those phases has its own steps. During planning, dependencies and requirements are analyzed, migration scenarios get developed and tested, and a project plan that incorporates the prior information is created. During the migration phase, the plan is enacted, and during post-migration, the completeness and thoroughness of the migration is validated, documented, closed out, including any necessary decommissioning of legacy systems.
For applications of moderate to high complexity, these data migration phases may be repeated several times before the new system is considered to be fully validated and deployed.
Planning: The data, applications, etc. that will be migrated are selected based on business, project, and technical requirements and dependencies. Hardware and bandwidth requirements are analyzed. Feasible migration and back-out scenarios are developed, as well as the associated tests, automation scripts,
mappings, and procedures. Data cleansing and transformation requirements are also gauged for
data formats to improve
data quality
Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for tsintended uses in operations, decision making and ...
and to eliminate
redundant or obsolete information. Migration architecture is decided on and developed, any necessary software licenses are obtained, and change management processes are started.
Migration: Hardware and software requirements are validated, and migration procedures are customized as necessary. Some sort of pre-validation testing may also occur to ensure requirements and customized settings function as expected. If all is deemed well, migration begins, including the primary acts of
data extraction
Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). The import into the intermediate extracting system is thus usual ...
, where data is read from the old system, and
data loading
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
, where data is written to the new system. Additional verification steps ensure the developed migration plan was enacted in full.
Post-migration: After data migration, results are subjected to
data verification Data verification is a process in which different types of data are checked for accuracy and inconsistencies after data migration is done. In some domains it is referred to Source Data Verification (SDV), such as in clinical trials.
Data verific ...
to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous
data loss Data loss is an error condition in information systems in which information is destroyed by failures (like failed spindle motors or head crashes on hard drives) or neglect (like mishandling, careless handling or storage under unsuitable conditions) ...
. Additional documentation and reporting of the migration project is conducted, and once the migration is validated complete, legacy systems may also be decommissioned. Migration close-out meetings will officially end the migration process.
Project versus process
There is a difference between data migration and
data integration
Data integration involves combining data residing in different sources and providing users with a unified view of them.
This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
activities. Data migration is a project by means of which data will be moved or copied from one environment to another, and removed or decommissioned in the source. During the migration (which can take place over months or even years), data can flow in multiple directions, and there may be multiple migrations taking place simultaneously. The ETL (
extract, transform, load
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
) actions will be necessary, although the means of achieving these may not be those traditionally associated with the ETL acronym.
Data integration, by contrast, is a permanent part of the
IT architecture
Information technology architecture is the process of development of methodical information technology specifications, models and guidelines, using a variety of information technology notations, for example Unified Modeling Language (UML), within a ...
, and is responsible for the way data flows between the various applications and data stores—and is a process rather than a project activity. Standard ETL technologies designed to supply data from operational systems to data warehouses would fit within the latter category.
Categories
Data is stored on various media in
files or
databases
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, and is generated and consumed by
software applications
Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work.
At the lowest programming level, executable code consists o ...
, which in turn support
business processes
A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
. The need to transfer and convert data can be driven by multiple business requirements, and the approach taken to the migration depends on those requirements. Four major migration categories are proposed on this basis.
Storage migration
A business may choose to rationalize the physical media to take advantage of more efficient storage technologies.
This will result in having to move physical blocks of data from one tape or disk to another, often using
virtualization
In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, stor ...
techniques. The data format and content itself will not usually be changed in the process and can normally be achieved with minimal or no impact to the layers above.
Database migration
Similarly, it may be necessary to move from one
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
vendor to another, or to upgrade the version of database software being used. The latter case is less likely to require a physical data migration, but this can happen with major upgrades. In these cases a physical transformation process may be required since the underlying data format can change significantly. This may or may not affect behavior in the applications layer, depending largely on whether the data manipulation language or protocol has changed.
However, some modern applications are written to be almost entirely agnostic to the database technology,
so a change from
Sybase
Sybase, Inc. was an enterprise software and services company. The company produced software to manage and analyze information in relational databases, with facilities located in California and Massachusetts. Sybase was acquired by SAP in 2010; ...
,
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
,
IBM Db2
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
or
SQL Server to
Oracle
An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination.
Description
The word '' ...
should only require a testing cycle to be confident that both functional and non-functional performance has not been adversely affected.
Application migration
Changing application vendor—for instance a new
CRM or
ERP platform—will inevitably involve substantial transformation as almost every application or suite operates on its own specific data model and also interacts with other applications and systems within the
enterprise application integration
Enterprise application integration (EAI) is the use of software and computer systems' architectural principles to integrate a set of enterprise computer applications.
Overview
Enterprise application integration is an integration framework comp ...
environment.
Furthermore, to allow the application to be sold to the widest possible market, commercial off-the-shelf packages are generally configured for each customer using
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
.
Application programming interfaces
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(APIs) may be supplied by vendors to protect the
integrity of the data they have to handle. It is also possible to script the web interfaces of vendors to automatically migrate data.
Business process migration
Business processes
A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
operate through a combination of human and application systems actions, often orchestrated by
business process management
Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business pro ...
tools. When these change they can require the movement of data from one store, database or application to another to reflect the changes to the organization and information about customers, products and operations. Examples of such migration drivers are mergers and acquisitions, business optimization, and reorganization to attack new markets or respond to competitive threat.
The first two categories of migration are usually routine operational activities that the IT department takes care of without the involvement of the rest of the business. The last two categories directly affect the operational users of processes and applications, are necessarily complex, and delivering them without significant business downtime can be challenging. A highly adaptive approach, concurrent synchronization, a business-oriented audit capability, and clear visibility of the migration for stakeholders—through a project management office or data governance team—are likely to be key requirements in such migrations.
Migration as a form of digital preservation
Migration, which focuses on the digital object itself, is the act of transferring, or rewriting data from an out-of-date medium to a current medium and has for many years been considered the only viable approach to long-term preservation of digital objects. Reproducing brittle newspapers onto
microfilm
Microforms are scaled-down reproductions of documents, typically either photographic film, films or paper, made for the purposes of transmission, storage, reading, and printing. Microform images are commonly reduced to about 4% or of the origin ...
is an example of such migration.
Disadvantages
* Migration addresses the possible obsolescence of the data carrier, but does not address the fact that certain technologies which run the data may be abandoned altogether, leaving migration useless.
* Time-consuming – migration is a continual process, which must be repeated every time a medium reaches obsolescence, for all data objects stored on a certain media.
* Costly – an institution must purchase additional data storage media at each migration.
See also
*
Data conversion
Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires tha ...
*
Data curation Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
*
Data preservation
Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging th ...
*
Data transformation
In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: http ...
*
Digital preservation
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
*
Extract, transform, load
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
(ETL)
*
System migration
System migration involves moving a set of instructions or programs, e.g., PLC (programmable logic controller) programs, from one platform to another, minimizing reengineering.
Migration of systems can also involve downtime, while the old syste ...
References
External links
*
{{Data
Data management