Evolutionary Database Design
   HOME

TheInfoList



OR:

Evolutionary database design involves
incremental Increment or incremental may refer to: *Incrementalism, a theory (also used in politics as a synonym for gradualism) *Increment and decrement operators, the operators ++ and -- in computer programming *Incremental computing *Incremental backup, wh ...
improvements to the database schema so that it can be continuously updated with changes, reflecting the customer's requirements. People across the globe work on the same piece of software at the same time hence, there is a need for techniques that allow a smooth evolution of
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
as the design develops. Such methods utilize automated
refactoring In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structu ...
and continuous integration so that it supports agile methodologies for software development. These development techniques are applied on systems that are in
pre-production Pre-production is the process of planning some of the elements involved in a film, television show, play, or other performance, as distinct from production and post-production. Pre-production ends when the planning ends and the content start ...
stage as well on systems that have already been released. These techniques not only cover relevant changes in the database schema according to customer's changing needs, but also migration of modified data into the database and also customizing the database access code accordingly without changing the data semantics.


History

After using the
waterfall model The waterfall model is a breakdown of project activities into linear sequential phases, meaning they are passed down onto each other, where each phase depends on the deliverables of the previous one and corresponds to a specialization of tasks. ...
for a long time, the software industry has witnessed a rise in adoption of agile methods for software development. Agile methodologies don’t assume requirements to be permanent at any stage of the software life cycle. These methods are designed to support sporadic changes in contrast to waterfall design technique. An important part of this approach is iterative development, where the entire software life-cycle is run multiple times during the life of a project. Every iteration witnesses the complete software development life cycle despite the iterations being of short duration that can vary between weeks to a few months. Before the adoption of these methodologies, the entire system was designed before starting to develop the code. The same principle was applied to the database schema as well where it was considered to be derived out of the
software requirements Software requirements for a system are the description of what the system should do, the service or services that it provides and the constraints on its operation. The IEEE Standard Glossary of Software Engineering Terminology defines a requirement ...
which were in turn developed by collaboration between the customer, end-users, business analysts, etc. and these requirements were not expected to change with the progress in the software development. This approach proved to be cumbersome because as time progressed, the redundancies in the existing database schema in the form of unused rows or columns were evident. This redundancy along with
data quality Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for tsintended uses in operations, decision making a ...
problems went on to become a costly affair. It was concluded that the practice of not having design interleaved with construction and testing was highly inefficient.


Techniques

As mentioned in the previous section evolutionary methods are iterative in nature and these methods have become immensely popular over last two decades. Evolutionary database design aims to construct the database schema over the course of the project instead of building the entire database schema at the beginning of the project. This method of database design can capture and deal effectively with the changing requirements of projects. There are five evolutionary database design techniques that can aid developers in building their database in an iterative fashion. A brief overview about the five techniques are provided below.


Database refactoring

Refactoring In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structu ...
is the process of making changes to the program without affecting the functionality of the program. Database refactoring is the technique of implementing small changes to the database schema without affecting the functionality and information stored in the database. The main purpose of
database refactoring A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bug ...
is to improve the database design so that the database is more in-sync with the changing requirements. The user can modify
tables Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table d ...
,
views A view is a sight or prospect or the ability to see or be seen from a particular place. View, views or Views may also refer to: Common meanings * View (Buddhism), a charged interpretation of experience which intensely shapes and affects thou ...
, stored procedures and triggers. Dependency between the database and external applications make database refactoring a challenge.


Evolutionary data modeling

Data modeling is the technique of identifying entities, associating
attributes Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
to the entities and deciding the data structure to represent the attributes. In the traditional database scenario, a logical data model is created at the beginning to represent the entities and their associated attributes. In evolutionary data modeling the technique of data modeling is performed in an iterative manner, that is multiple data models are developed, each model representing a different aspect of the database. This kind of data modeling technique is practiced in an agile environment and it is one of the main principles of agile development.


Database regression testing

Whenever a new functionality is added to a system, it is essential to verify that the update does not corrupt or render the system unusable. In a database, the business logic is implemented in stored procedures,
data validation In computer science, data validation is the process of ensuring data has undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful. It uses routines, often called "validation rules", "validation cons ...
rules and
referential integrity Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (e ...
and they have to be tested thoroughly when any change is implemented in the system.
Regression testing Regression testing (rarely, ''non-regression testing'') is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. If not, that would be called a '' regre ...
is the process of executing all the
test case In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise ...
s whenever a new feature is added to the system. test-first development (TFD) is a form of regression testing followed in evolutionary database design. The steps involved in TFD approach are, * Before adding a new function to the system, add a test to the test case suite such that the system fails the test * Run the tests, either the entire set of test cases or just a subset and ensure that the newly added test does indeed fail * Update the function such that the test passes * Run the tests again to ensure that all they succeed and that the system is not broken


Configuration management of database artifacts

Configuration management is a detailed recording of versions and updates that have been applied to any system. Configuration management is useful in rolling back updates and changes which have impacted the system in a negative manner. To ensure that any updates made in database refactoring can be rolled back, it is important to maintain database artifacts like data definition language scripts, data model files,
reference data Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: * Units of measurement * Country codes * Corporate codes * Fixed conversion rates e.g ...
, stored procedures, etc. in a configuration management system.


Developer sandboxes

A sandbox is a fully functional environment in which the system can be built, tested and executed. In order to make changes to the database schema in an evolutionary manner it is ideal for every developer to have his/her own physical sandbox, copy of
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
and a copy of
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
. In a sandbox environment the developer can make changes to the database schema and run tests without affecting the work of other developers and other environments. Once the change has been implemented successfully, it is promoted to
pre-production Pre-production is the process of planning some of the elements involved in a film, television show, play, or other performance, as distinct from production and post-production. Pre-production ends when the planning ends and the content start ...
environment where in acceptance testing is performed and after the acceptance tests succeed it is deployed into
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stati ...
.


Advantages and disadvantages


Advantages

# High quality of database design: In evolutionary database design, the developer makes small changes to the database schema in an incremental manner and this achieves a highly optimized database schema. # Handling change: In a traditional database approach, a lot of time is spent in remodeling and restructuring the database when the requirements change. In evolutionary database technique, the
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
of the database is adjusted periodically to keep up with the changing requirements. Hence, evolutionary database design technique is better suited in handling the changing requirements. # Guaranteed working of system at all times: The evolutionary database design approach follows test-first development model, in which the complete working of a system is tested before and after implementing an update. Hence, it is guaranteed that the system always works. # Compatible with software development: The IT industry is progressing towards agile method of software development and evolutionary database design ensures that data development is in sync with software development. # Reduced overall effort: In an evolutionary environment only the functionality that is required at that moment is implemented and no more.


Disadvantages

# Cultural impediments: Evolutionary database design approach is relatively a newer concept and many well qualified data professionals still advocate the traditional approach. Therefore, most of the databases are still being designed in a serial fashion and evolutionary database design is yet to gain support and traction among experienced data professionals. # Requires a learning curve: Most of the developers are more familiar with the traditional approach and it takes time to learn evolutionary design as it is not intuitive. # Complex: When the database has many external dependencies, making changes to the schema becomes all the more complicated as the external dependencies should also be updated to cope up with the changes made in the database schema. With the increase in number of dependencies, Evolutionary Database Design approach becomes extremely complex.


Comparison with traditional database design

Traditional database design technique does not support changes like evolutionary database design technique.'Unfortunately, the traditional data community assumed that evolving database schema is a hard thing to do and as a result never thought through how to do it.' In a way, the evolutionary design is better for application developers and traditional design is better for data professionals.


Tools

Given below are a list of tools that provide the functionality of designing and developing a database in an evolutionary manner. * LiquiBase * Red Gate Deployment Manager * Ruby on Rails Active Record Migration * Flyway (software) * Autopatch


See also

* Database management system * Agile software development *
Data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
*
Test-driven development Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against al ...
*
Regression testing Regression testing (rarely, ''non-regression testing'') is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. If not, that would be called a '' regre ...
*
Sandbox (software development) A sandbox is a testing environment that isolates untested code changes and outright experimentation from the production environment or repository, in the context of software development including Web development, Automation and revision control. ...
* Configuration management *
Database Refactoring A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bug ...
* Continuous design


References

{{Reflist Database theory