Evolutionary database design involves
incremental
Increment or incremental may refer to:
*Incrementalism, a theory (also used in politics as a synonym for gradualism)
*Increment and decrement operators, the operators ++ and -- in computer programming
*Incremental computing
*Incremental backup, wh ...
improvements to the
database schema
The database schema is the structure of a database described in a formal language supported by the database management system (DBMS). The term " schema" refers to the organization of data as a blueprint of how the database is constructed (divid ...
so that it can be continuously updated with changes, reflecting the customer's requirements. People across the globe work on the same piece of software at the same time hence, there is a need for techniques that allow a smooth evolution of
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
as the design develops. Such methods utilize automated
refactoring
In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structu ...
and
continuous integration so that it supports
agile methodologies for software development. These development techniques are applied on systems that are in
pre-production
Pre-production is the process of planning some of the elements involved in a film, television show, play, or other performance, as distinct from production and post-production. Pre-production ends when the planning ends and the content st ...
stage as well on systems that have already been released. These techniques not only cover relevant changes in the database schema according to customer's changing needs, but also migration of modified data into the database and also customizing the database access code accordingly without changing the
data semantics
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
.
History
After using the
waterfall model
The waterfall model is a breakdown of project activities into linear sequential phases, meaning they are passed down onto each other, where each phase depends on the deliverables of the previous one and corresponds to a specialization of tasks. ...
for a long time, the software industry has witnessed a rise in adoption of agile methods for software development.
Agile methodologies don’t assume
requirements
In product development and process optimization, a requirement is a singular documented physical or functional need that a particular design, product or process aims to satisfy. It is commonly used in a formal sense in engineering design, inclu ...
to be permanent at any stage of the
software life cycle
A software release life cycle is the sum of the stages of development and maturity for a piece of computer software ranging from its initial development to its eventual release, and including updated versions of the released version to help impro ...
. These methods are designed to support sporadic changes in contrast to waterfall design technique. An important part of this approach is
iterative development, where the entire software life-cycle is run multiple times during the life of a project. Every iteration witnesses the complete software development life cycle despite the iterations being of short duration that can vary between weeks to a few months.
Before the adoption of these methodologies, the entire system was designed before starting to develop the code. The same principle was applied to the database schema as well where it was considered to be derived out of the
software requirements Software requirements for a system are the description of what the system should do, the service or services that it provides and the constraints on its operation. The IEEE Standard Glossary of Software Engineering Terminology defines a requirement ...
which were in turn developed by collaboration between the customer, end-users, business analysts, etc. and these requirements were not expected to change with the progress in the software development. This approach proved to be cumbersome because as time progressed, the redundancies in the existing database schema in the form of unused rows or columns were evident. This redundancy along with
data quality
Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for tsintended uses in operations, decision making and p ...
problems went on to become a costly affair. It was concluded that the practice of not having design interleaved with construction and testing was highly inefficient.
Techniques
As mentioned in the previous section evolutionary methods are iterative in nature and these methods have become immensely popular over last two decades. Evolutionary database design aims to construct the database schema over the course of the project instead of building the entire database schema at the beginning of the project. This method of database design can capture and deal effectively with the changing requirements of projects.
There are five evolutionary database design techniques that can aid developers in building their database in an iterative fashion. A brief overview about the five techniques are provided below.
Database refactoring
Refactoring
In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structu ...
is the process of making changes to the program without affecting the functionality of the program. Database refactoring is the technique of implementing small changes to the database schema without affecting the functionality and information stored in the database. The main purpose of
database refactoring A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bugs ...
is to improve the
database design
Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model.Teorey, T ...
so that the database is more in-sync with the changing requirements. The user can modify
tables
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
,
views,
stored procedure
A stored procedure (also termed proc, storp, sproc, StoPro, StoredProc, StoreProc, sp, or SP) is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data di ...
s and
triggers. Dependency between the database and external applications make database refactoring a challenge.
Evolutionary data modeling
Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques.
Overview
Data modeling is a process used to define and analyze data requirements needed to su ...
is the technique of identifying
entities
An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually r ...
, associating
attributes
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a pro ...
to the entities and deciding the
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...
to represent the attributes.
In the traditional database scenario, a logical data model is created at the beginning to represent the entities and their associated attributes. In evolutionary data modeling the technique of data modeling is performed in an iterative manner, that is multiple data models are developed, each model representing a different aspect of the database. This kind of data modeling technique is practiced in an agile environment and it is one of the main principles of agile development.
Database regression testing
Whenever a new functionality is added to a system, it is essential to verify that the update does not corrupt or render the system unusable. In a database, the business logic is implemented in
stored procedure
A stored procedure (also termed proc, storp, sproc, StoPro, StoredProc, StoreProc, sp, or SP) is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data di ...
s,
data validation
In computer science, data validation is the process of ensuring data has undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful. It uses routines, often called "validation rules", "validation cons ...
rules and
referential integrity
Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (e ...
and they have to be tested thoroughly when any change is implemented in the system.
Regression testing
Regression testing (rarely, ''non-regression testing'') is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. If not, that would be called a ''regres ...
is the process of executing all the
test case
In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise ...
s whenever a new feature is added to the system.
test-first development
Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against a ...
(TFD) is a form of regression testing followed in evolutionary database design. The steps involved in TFD approach are,
* Before adding a new function to the system, add a test to the test case suite such that the system fails the test
* Run the tests, either the entire set of test cases or just a subset and ensure that the newly added test does indeed fail
* Update the function such that the test passes
* Run the tests again to ensure that all they succeed and that the system is not broken
Configuration management of database artifacts
Configuration management is a detailed recording of versions and updates that have been applied to any system. Configuration management is useful in
rolling back updates and changes which have impacted the system in a negative manner. To ensure that any updates made in database refactoring can be rolled back, it is important to maintain database artifacts like
data definition language scripts, data model files,
reference data
Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time.
Examples of reference data include:
* Units of measurement
* Country codes
* Corporate codes
* Fixed conversion rates e.g. ...
, stored procedures, etc. in a configuration management system.
Developer sandboxes
A
sandbox is a fully functional environment in which the system can be built, tested and executed. In order to make changes to the database schema in an evolutionary manner it is ideal for every developer to have his/her own physical sandbox, copy of
source code
In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
and a copy of
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
. In a sandbox environment the developer can make changes to the database schema and run tests without affecting the work of other developers and other environments. Once the change has been implemented successfully, it is promoted to
pre-production
Pre-production is the process of planning some of the elements involved in a film, television show, play, or other performance, as distinct from production and post-production. Pre-production ends when the planning ends and the content st ...
environment where in
acceptance testing
In engineering and its various subdisciplines, acceptance testing is a test conducted to determine if the requirements of a specification or contract are met. It may involve chemical tests, physical tests, or performance tests.
In systems e ...
is performed and after the acceptance tests succeed it is deployed into
production
Production may refer to:
Economics and business
* Production (economics)
* Production, the act of manufacturing goods
* Production, in the outline of industrial organization, the act of making products (goods and services)
* Production as a stati ...
.
Advantages and disadvantages
Advantages
# High quality of database design: In evolutionary database design, the developer makes small changes to the database schema in an incremental manner and this achieves a highly
optimized database schema.
# Handling change: In a
traditional database approach, a lot of time is spent in remodeling and restructuring the database when the requirements change. In evolutionary database technique, the
schema
The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms.
Schema may refer to:
Science and technology
* SCHEMA ...
of the database is adjusted periodically to keep up with the changing requirements. Hence, evolutionary database design technique is better suited in handling the changing requirements.
# Guaranteed working of system at all times: The evolutionary database design approach follows
test-first development
Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against a ...
model, in which the complete working of a system is tested before and after implementing an update. Hence, it is guaranteed that the system always works.
# Compatible with software development: The IT industry is progressing towards agile method of software development and evolutionary database design ensures that data development is in sync with software development.
# Reduced overall effort: In an evolutionary environment only the functionality that is required at that moment is implemented and no more.
Disadvantages
# Cultural impediments: Evolutionary database design approach is relatively a newer concept and many well qualified
data professionals still advocate the traditional approach. Therefore, most of the databases are still being designed in a serial fashion and evolutionary database design is yet to gain support and traction among experienced data professionals.
# Requires a learning curve: Most of the developers are more familiar with the traditional approach and it takes time to learn evolutionary design as it is not intuitive.
# Complex: When the database has many external dependencies, making changes to the schema becomes all the more complicated as the external dependencies should also be updated to cope up with the changes made in the database schema. With the increase in number of dependencies, Evolutionary Database Design approach becomes extremely complex.
Comparison with traditional database design
Traditional database design technique does not support changes like evolutionary database design technique.'Unfortunately, the traditional data community assumed that evolving database schema is a hard thing to do and as a result never thought through how to do it.'
In a way, the evolutionary design is better for application developers and traditional design is better for data professionals.
Tools
Given below are a list of tools that provide the functionality of designing and developing a database in an evolutionary manner.
*
LiquiBase
*
Red Gate Deployment Manager
Redgate Software is a software company based in Cambridge, England. It develops tools for developers and data professionals and maintains community websites such as SQL Server Central and Simple Talk.
Redgate produces specialized database ma ...
*
Ruby on Rails Active Record Migration
*
Flyway (software)
Flyway is an open-source database-migration tool.
Concept
Flyway is based around seven basic commands: Migrate, Clean, Info, Validate, Undo, Baseline, and Repair.
Migrations can be written in SQL (database-specific syntax such as PL/SQL, T- ...
*
Autopatch
An autopatch, sometimes called a phone patch, is a feature of an amateur radio (or other type of two-way radio) repeater or base station to access an outgoing telephone connection. Users with a transceiver capable of producing touch tones (DTMF ...
See also
*
Database management system
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
*
Agile software development
In software development, agile (sometimes written Agile) practices include requirements discovery and solutions improvement through the collaborative effort of self-organizing and cross-functional teams with their customer(s)/ end user(s), ...
*
Data model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
*
Test-driven development
Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against a ...
*
Regression testing
Regression testing (rarely, ''non-regression testing'') is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. If not, that would be called a ''regres ...
*
Sandbox (software development)
A sandbox is a testing environment that isolates untested code changes and outright experimentation from the production environment or repository, in the context of software development including Web development, Automation and revision control. ...
*
Configuration management
*
Database Refactoring A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bugs ...
*
Continuous design Evolutionary design, continuous design, evolutive design, or incremental design is directly related to any modular design application, in which components can be freely substituted to improve the design, modify performance, or change another feature ...
References
{{Reflist
Database theory