HOME

TheInfoList




Data modeling in
software engineering Software engineering is the systematic application of engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The d ...
is the process of creating a
data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing ...

data model
for an
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, information storage, store, and information distribution, distribute information. From a sociotechnical perspective, information systems ar ...
by applying certain formal techniques.


Overview

Data modeling is a
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management *Business process A business process, business method ...

process
used to define and analyze data
requirement In product development In business Business is the activity of making one's living or making money by producing or buying and selling products (such as goods and services). Simply put, it is "any activity or enterprise entered into for p ...
s needed to support the
business process A business process, business method or business function is a collection of related, structured activities or tasks Task may refer to: * Task (project management), an activity that needs to be accomplished within a defined period of time or by a ...
es within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system. There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system.Simison, Graeme. C. & Witt, Graham. C. (2005). ''Data Modeling Essentials''. 3rd Edition.
Morgan Kaufmann Publishers Morgan Kaufmann Publishers is a Burlington, Massachusetts (San Francisco San Francisco (/Help:IPA/English, ˌsæn fɹənˈsɪskoʊ/; Spanish language, Spanish for "Francis of Assisi, Saint Francis"), officially the City and County of San F ...
.
The data requirements are initially recorded as a
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient d ...
which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The
conceptual model A conceptual model is a depiction, representation of a system. It consists of concepts used to help people knowledge, know, understanding, understand, or simulation, simulate a subject the model represents. It is also a set of concepts. In contras ...
is then translated into a
logical data modelA logical data model or logical schema is a data model A data model (or datamodel) is an abstract model that organizes elements of data Data are units of information Information can be thought of as the resolution of uncertainty; it answ ...
, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a
physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the Project lifecycle, lifecycle of a project it typically derives from a logical d ...
that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them. Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling: * to assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations and clients to understand and use an agreed upon semi-formal model that encompasses the concepts of the organization and how they relate to one another * to manage data as a resource * to integrate information systems * to design databases/data warehouses (aka data repositories) Data modeling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten et al. (2004) determined two types of data modeling: * Strategic data modeling: This is part of the creation of an information systems strategy, which defines an overall vision and architecture for information systems.
Information technology engineering Information engineering (IE), also known as Information technology engineering (ITE), information engineering methodology (IEM) or data engineering, is a software engineering approach to designing and developing information systems. Overview Info ...
is a methodology that embraces this approach. * Data modeling during systems analysis: In
systems analysis Systems analysis is "the process of studying a procedure or business Business is the activity of making one's living or making money by producing or buying and selling products (such as goods and services). Simply put, it is "any activity ...
logical data models are created as part of the development of new databases. Data modeling is also used as a technique for detailing business
requirement In product development In business Business is the activity of making one's living or making money by producing or buying and selling products (such as goods and services). Simply put, it is "any activity or enterprise entered into for p ...
s for specific
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...

database
s. It is sometimes called ''database modeling'' because a
data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing ...

data model
is eventually implemented in a database. Whitten, Jeffrey L.; Lonnie D. Bentley,
Kevin C. DittmanKevin C. Dittman (born ca. 1960) is an American computer scientist A computer scientist is a person A person (plural people or persons) is a being that has certain capacities or attributes such as reason, morality, consciousness or self-consci ...
. (2004). ''Systems Analysis and Design Methods''. 6th edition. .


Topics


Data models

Data models provide a framework for
data Data (; ) are individual facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to experience. Standard reference works are often used ...

data
to be used within
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, information storage, store, and information distribution, distribute information. From a sociotechnical perspective, information systems ar ...
s by providing specific definition and format. If a data model is used consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor.Matthew West and Julian Fowler (1999)
Developing High Quality Data Models
The European Process Industries STEP Technical Liaison Executive (EPISTLE).
Some common problems found in data models are: * Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces. So, business rules need to be implemented in a flexible way that does not result in complicated dependencies, rather the data model should be flexible enough so that changes in the business can be implemented within the data model in a relatively quick and efficient way. * Entity types are often not identified, or are identified incorrectly. This can lead to replication of data, data structure and functionality, together with the attendant costs of that duplication in development and maintenance. Therefore, data definitions should be made as explicit and easy to understand as possible to minimize misinterpretation and duplication. * Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems. Required interfaces should be considered inherently while designing a data model, as a data model on its own would not be usable without interfaces within different systems. * Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardised. To obtain optimal value from an implemented data model, it is very important to define standards that will ensure that data models will both meet business needs and be consistent.


Conceptual, logical and physical schemas

In 1975
ANSI The American National Standards Institute (ANSI ) is a private non-profit organization A nonprofit organization (NPO), also known as a non-business entity, not-for-profit organization, or nonprofit institution, is a legal entity organiz ...
described three kinds of data-model ''instance'': *
Conceptual schema A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient d ...
: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements. *
Logical schema A logical data model or logical schema is a data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a da ...
: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same. *
Physical schema A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system A database is an organized collection of Data (computing), data, generally stored and ...
: describes the physical means used to store data. This is concerned with partitions, CPUs,
tablespace A tablespace is a storage location where the actual data underlying database A database is an organized collection of data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the que ...
s, and the like. According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model.


Data modeling process

In the context of business process integration (see figure), data modeling complements
business process modeling Business process modeling (BPM) in business process management Business process management (BPM) is the discipline in which people use various methods to discover, model In general, a model is an informative representation of an object, perso ...
, and ultimately results in database generation. The process of designing a database involves producing the previously described three types of schemas - conceptual, logical, and physical. The database design documented in these schemas are converted through a
Data Definition Language In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language A programmin ...
, which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of the design of an overall
database system In computing, a database is an organized collection of Data (computing), data stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal #Design and modeling, design and mode ...

database system
. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the
relational model The relational model (RM) for database In computing, a database is an organized collection of Data (computing), data stored and accessed electronically from a computer system. Where databases are more complex they are often developed using form ...
these are the
tables Table may refer to: * Table (information) A table is an arrangement of information or data, typically in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. Tables ap ...
and views. In an
object database An object database is a database management system In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and de ...
the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the
Database Management System In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...
or DBMS. In the process, system
interface Interface or interfacing may refer to: Academic journals * Interface (journal), ''Interface'' (journal), by the Electrochemical Society * ''Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Lin ...
s account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share a common data model. If data models are developed on a system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within the organization


Modeling methodologies

Data models represent information areas of interest. While there are many ways to create data models, according to Len Silverston (1997)Len Silverston, W.H.Inmon, Kent Graziano (2007). ''The Data Model Resource Book''. Wiley, 1997. . Reviewed b
Van Scott on tdan.com
Accessed November 1, 2008.
only two modeling methodologies stand out, top-down and bottom-up: * Bottom-up models or View Integration models are often the result of a reengineering effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an enterprise perspective. They may not promote data sharing, especially if they are built without reference to other parts of the organization. * Top-down
logical data modelA logical data model or logical schema is a data model A data model (or datamodel) is an abstract model that organizes elements of data Data are units of information Information can be thought of as the resolution of uncertainty; it answ ...
s, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template. Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. Unfortunately, in many environments the distinction between a logical data model and a physical data model is blurred. In addition, some
CASE Case or CASE may refer to: Containers * Case (goods) A case of some merchandise is a collection of items packaged together. A case is not a strict unit of measure. For consumer foodstuff such as canned goods, soft drink, soda, cereal, and such, ...
tools don't make a distinction between logical and
physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the Project lifecycle, lifecycle of a project it typically derives from a logical d ...
s.


Entity–relationship diagrams

There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the
data Data (; ) are individual facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to experience. Standard reference works are often used ...

data
. An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema
database model A database model is a type of data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may ...
ing method, used in
software engineering Software engineering is the systematic application of engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The d ...
to produce a type of
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
(or
semantic data model Semantic data model (SDM) is a high-level semantics-based database description and structuring formalism (database model) for databases. This database model is designed to capture more of the meaning of an application environment than is possibl ...
) of a system, often a
relational database A relational database is a digital database In , a database is an organized collection of stored and accessed electronically from a . Where databases are more complex they are often developed using formal techniques. The (DBMS) is the tha ...
, and its requirements in a
top-down Top-down may refer to: Arts and entertainment * "Top Down", a 2007 song by Swizz Beatz * "Top Down", a song by Lil Yachty from ''Lil Boat 3'' Science * Top-down reading, is a part of reading science that explains the reader's psycholinguistic ...
fashion. These models are being used in the first stage of
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, information storage, store, and information distribution, distribute information. From a sociotechnical perspective, information systems ar ...
design during the
requirements analysis In systems engineering and software engineering, requirements analysis focuses on the tasks that determine the needs or conditions to meet the new or altered product or project, taking account of the possibly conflicting requirements of the v ...
to describe information needs or the type of
information Information is processed, organised and structured data Data (; ) are individual facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to c ...

information
that is to be stored in a
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...

database
. The
data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing ...

data model
ing technique can be used to describe any
ontology Ontology is the branch of philosophy that studies concepts such as existence, being, Becoming (philosophy), becoming, and reality. It includes the questions of how entities are grouped into Category of being, basic categories and which of these ...
(i.e. an overview and classifications of used terms and their relationships) for a certain
universe of discourse In the formal sciences Formal science is a branch of science studying formal language disciplines concerned with formal systems, such as logic Logic is an interdisciplinary field which studies truth and reasoning Reason is the capacity ...
i.e. area of interest. Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are: *
Bachman diagram Data structure diagram (DSD) is a diagram A diagram is a symbolic Depiction, representation of information using Visualization (graphics), visualization techniques. Diagrams have been used since prehistoric times on Cave painting, walls of cav ...

Bachman diagram
s * Barker's notation * Chen's notation * Data Vault Modeling *
Extended Backus–Naur form In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of com ...
*
IDEF1X Integration DEFinition for information modeling (IDEF1X) is a data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data mod ...
* Object-relational mapping *
Object-Role Modeling Object-role modeling (ORM) is used to model the semantics Semantics (from grc, wikt:σημαντικός, σημαντικός ''sēmantikós'', "significant") is the study of meaning, reference, or truth. The term can be used to refer to sub ...
and Fully Communication Oriented Information Modeling *
Relational Model The relational model (RM) for database In computing, a database is an organized collection of Data (computing), data stored and accessed electronically from a computer system. Where databases are more complex they are often developed using form ...
* Relational Model/Tasmania


Generic data modeling

Generic data models are generalizations of conventional
data model A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing ...

data model
s. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a
binary relation Binary may refer to: Science and technology Mathematics * Binary number In mathematics and digital electronics Digital electronics is a field of electronics The field of electronics is a branch of physics and electrical engineeri ...
between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related. Given an extensible list of classes, this allows the classification of any individual thing and to specify part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.


Semantic data modeling

The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity. Therefore, the need to define data from a conceptual view has led to the development of
semantic data model Semantic data model (SDM) is a high-level semantics-based database description and structuring formalism (database model) for databases. This database model is designed to capture more of the meaning of an application environment than is possibl ...
ing techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an
abstraction Abstraction in its main sense is a conceptual process where general rules Rule or ruling may refer to: Human activity * The exercise of political Politics (from , ) is the set of activities that are associated with Decision-making, mak ...
which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world. A semantic data model can be used to serve many purposes, such as: * planning of data resources * building of shareable databases * evaluation of vendor software * integration of existing databases The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful
abstraction Abstraction in its main sense is a conceptual process where general rules Rule or ruling may refer to: Human activity * The exercise of political Politics (from , ) is the set of activities that are associated with Decision-making, mak ...
concepts known from the
Artificial Intelligence Artificial intelligence (AI) is intelligence Intelligence has been defined in many ways: the capacity for abstraction Abstraction in its main sense is a conceptual process where general rules and concept Concepts are defined as abstra ...

Artificial Intelligence
field. The idea is to provide high level modeling primitives as integral part of a data model in order to facilitate the representation of real world situations."Semantic data modeling" In: ''Metaclasses and Their Application''. Book Series Lecture Notes in Computer Science. Publisher Springer Berlin / Heidelberg. Volume Volume 943/1995.


See also

*
Architectural pattern (computer science) An architectural pattern is a general, reusable solution to a commonly occurring problem in software architecture Software architecture refers to the fundamental structures of a software system and the discipline of creating such structures and ...
* Comparison of data modeling tools *
Data (computing) In computing, data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols. Datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represented using the ...
*
Data dictionary A data dictionary, or metadata repository, as defined in the ''IBM Dictionary of Computing'', is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". '' Oracle'' defines it a ...

Data dictionary
* Document modeling * Enterprise data modeling * Entity Data Model *
Information management Information management (IM) concerns a cycle of organizational activity: the acquisition of information Information is processed, organised and structured data Data (; ) are individual facts A fact is something that is truth, ...
* Informative modeling *
Metadata modeling Metadata is "data Data (; ) are individual facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to experience. Standard referenc ...
*
Three schema approach The three-schema approach, or three-schema concept, in software engineering Software engineering is the systematic application of engineering Engineering is the use of scientific principles to design and build machines, structures, and o ...
*
Zachman Framework The Zachman Framework is an enterprise ontology (information science), ontology and is a fundamental structure for Enterprise Architecture which provides a formal and structured way of view model, viewing and defining an enterprise. The ontology i ...

Zachman Framework


References

*


Further reading

* J.H. ter Bekke (1991). ''Semantic Data Modeling in Relational Environments'' * John Vincent Carlis, Joseph D. Maguire (2001). ''Mastering Data Modeling: A User-driven Approach''. * Alan Chmura, J. Mark Heumann (2005). ''Logical Data Modeling: What it is and how to Do it''. * Martin E. Modell (1992). ''Data Analysis, Data Modeling, and Classification''. * M. Papazoglou, Stefano Spaccapietra, Zahir Tari (2000). ''Advances in Object-oriented Data Modeling''. * G. Lawrence Sanders (1995). ''Data Modeling'' * Graeme C. Simsion, Graham C. Witt (2005). ''Data Modeling Essentials * Matthew West (2011) ''Developing High Quality Data Models''


External links


Agile/Evolutionary Data Modeling

Data modeling articles

Database Modelling in UML






Notes on by Tony Drewry
Request For Proposal - Information Management Metamodel (IMM)
of the Object Management Group
Data Modeling is NOT just for DBMS's Part 1
Chris Bradley
Data Modeling is NOT just for DBMS's Part 2
Chris Bradley {{DEFAULTSORT:Data Modeling