Federated database system
   HOME

TheInfoList



OR:

A federated database system (FDBS) is a type of
meta- Meta (from the Greek μετά, ''meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending". In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or endea ...
database management system (DBMS), which transparently maps multiple autonomous
database systems In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
into a single federated database. The constituent
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s are interconnected via a
computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging several disparate databases. A federated database, or virtual database, is a composite of all constituent databases in a federated database system. There is no actual data integration in the constituent disparate databases as a result of data federation. Through data abstraction, federated database systems can provide a uniform
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
, enabling users and clients to store and retrieve
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
from multiple noncontiguous
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s with a single query—even if the constituent databases are heterogeneous. To this end, a federated database system must be able to decompose the query into subqueries for submission to the relevant constituent DBMSs, after which the system must composite the
result set An SQL result set is a set of rows from a database, as well as metadata about the query such as the column names, and the types and sizes of each column. Depending on the database system In computing, a database is an organized collection of ...
s of the subqueries. Because various database management systems employ different
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
s, federated database systems can apply wrappers to the subqueries to translate them into the appropriate
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
s.


Definition

McLeod and Heimbigner" were among the first to define a federated database system in the mid 1980s. A FDBS is one which "define the architecture and interconnect databases that minimize central authority yet support partial sharing and coordination among database systems". This description might not accurately reflect the McLeod/Heimbigner definition of a federated database. Rather, this description fits what McLeod/Heimbigner called a ''composite'' database. McLeod/Heimbigner's federated database is a collection of autonomous components that make their data available to other members of the federation through the publication of an export schema and access operations; there is no unified, central schema that encompasses the information available from the members of the federation. Among other surveys," practitioners define a Federated Database as a collection of cooperating component systems which are autonomous and are possibly heterogeneous. The three important components of an FDBS are autonomy,
heterogeneity Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
and distribution. Another dimension which has also been considered is the Networking Environment
Computer Network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
, e.g., many DBSs over a LAN or many DBSs over a WAN update related functions of participating DBSs (e.g., no updates, nonatomic transitions, atomic updates).


FDBS architecture

A DBMS can be classified as either centralized or distributed. A centralized system manages a single database while distributed manages multiple databases. A component DBS in a DBMS may be centralized or distributed. A multiple DBS (MDBS) can be classified into two types depending on the autonomy of the component DBS as federated and non federated. A nonfederated database system is an integration of component DBMS that are not autonomous. A federated database system consists of component DBS that are autonomous yet participate in a federation to allow partial and controlled sharing of their data. Federated architectures differ based on levels of integration with the component database systems and the extent of services offered by the federation. A FDBS can be categorized as loosely or tightly coupled systems. * Loosely Coupled require component databases to construct their own federated schema. A user will typically access other component database systems by using a multidatabase language but this removes any levels of location transparency, forcing the user to have direct knowledge of the federated schema. A user imports the data they require from other component databases and integrates it with their own to form a federated schema. * Tightly coupled system consists of component systems that use independent processes to construct and publicize an integrated federated schema. Multiple DBS of which FDBS are a specific type can be characterized along three dimensions: Distribution, Heterogeneity and Autonomy. Another characterization could be based on the dimension of networking, for example single databases or multiple databases in a LAN or WAN.


Distribution

Distribution of data in an FDBS is due to the existence of a multiple DBS before an FDBS is built. Data can be distributed among multiple databases which could be stored in a single computer or multiple computers. These computers could be geographically located in different places but interconnected by a network. The benefits of data distribution help in increased availability and reliability as well as improved access times.


Heterogeneity

Heterogeneities in databases arise due to factors such as differences in structures, semantics of data, the constraints supported or
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
. Differences in structure occur when two
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
s provide different primitives such as object oriented (OO) models that support specialization and inheritance and relational models that do not. Differences due to constraints occur when two models support two different constraints. For example, the set type in
CODASYL CODASYL, the Conference/Committee on Data Systems Languages, was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers. This effort led to the development of the programming l ...
schema may be partially modeled as a referential integrity constraint in a relationship schema.
CODASYL CODASYL, the Conference/Committee on Data Systems Languages, was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers. This effort led to the development of the programming l ...
supports insertion and retention that are not captured by referential integrity alone. The query language supported by one DBMS can also contribute to
heterogeneity Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
between other component DBMSs. For example, differences in query languages with the same
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
s or different versions of query languages could contribute to
heterogeneity Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
. Semantic heterogeneities arise when there is a disagreement about meaning, interpretation or intended use of
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
. At the schema and data level, classification of possible heterogeneities include: * Naming conflicts e.g.
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s using different names to represent the same concept. * Domain conflicts or
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
representation conflicts e.g.
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s using different values to represent same concept. * Precision conflicts e.g.
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s using same data values from domains of different
cardinalities In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
for same
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
. * Metadata conflicts e.g. same concepts are represented at schema level and instance level. *
Data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
conflicts e.g. missing
attributes Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
* Schema conflicts e.g. table versus table conflict which includes naming conflicts, data conflicts etc. In creating a federated schema, one has to resolve such heterogeneities before integrating the component DB schemas.


Schema matching, schema mapping

Dealing with incompatible data types or query syntax is not the only obstacle to a concrete implementation of an FDBS. In systems that are not planned top-down, a generic problem lies in matching
semantically equivalent {{about, semantic equivalence of metadata, the concept in mathematical logic, Logical equivalence In computer metadata, semantic equivalence is a declaration that two data elements from different vocabularies contain data that has similar meaning. ...
, but differently named parts from different schemas (=data models) (tables, attributes). A pairwise mapping between ''n'' attributes would result in n (n-1) \over 2 mapping rules (given equivalence mappings) - a number that quickly gets too large for practical purposes. A common way out is to provide a global schema that comprises the relevant parts of all member schemas and provide mappings in the form of database views. Two principal approaches depend on the direction of the mapping: # Global as View (GaV): the global schema is defined in terms of the underlying schemas # Local as View (LaV): the local schemas are defined in terms of the global schema Both are examples of
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
, called the schema matching problem.


Autonomy

Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to understand the aspects of autonomy for component databases and how they can be addressed when a component DBS participates in an FDBS. There are four kinds of autonomies addressed: * Design Autonomy which refers to ability to choose its design irrespective of data, query language or conceptualization, functionality of the system implementation. Heterogeneities in an FDBS are primarily due to design autonomy. * Communication autonomy refers to the general operation of the DBMS to communicate with other DBMS or not. * Execution autonomy allows a component DBMS to control the operations requested by local and external operations. * Association autonomy gives a power to component DBS to disassociate itself from a federation which means FDBS can operate independently of any single DBS. The ANSI/X3/SPARC Study Group outlined a three level data description architecture, the components of which are the conceptual schema, internal schema and external schema of databases. The three level architecture is however inadequate to describing the architectures of an FDBS. It was therefore extended to support the three dimensions of the FDBS namely Distribution, Autonomy and Heterogeneity. The five level schema architecture is explained below.


Concurrency control

The ''Heterogeneity'' and ''Autonomy'' requirements pose special challenges concerning
concurrency control In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while ...
in an FDBS, which is crucial for the correct execution of its concurrent transactions (see also Global concurrency control). Achieving global serializability, the major correctness criterion, under these requirements has been characterized as very difficult and unsolved.
Commitment ordering Commitment ordering (CO) is a class of interoperable '' serializability'' techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation ...
, introduced in 1991, has provided a general solution for this issue (See Global serializability; See
Commitment ordering Commitment ordering (CO) is a class of interoperable '' serializability'' techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation ...
also for the architectural aspects of the solution).


Five level schema architecture for FDBSs

The five level schema architecture includes the following: * Local Schema is basically the conceptual model of a component database expressed in a native data model. * Component schema is the subset of the local schema that the owner organisation is willing to share with other users of the FDBS and it is translated into a
common data model A common data model (CDM) can refer to any standardised data model which allows for data and information exchange between different applications and data sources. Common data models aim to standardise logical infrastructure so that related applicat ...
. * Export Schema represents a subset of a component schema that is available to a particular federation. It may include access control information regarding its use by a specific federation user. The export schema helps in managing flow of control of data. * Federated Schema is an integration of multiple export schemas. It includes information on data distribution that is generated when integrating export schemas. * External schema is extracted from a federated schema, and is defined for the users/applications of a particular federation. While accurately representing the state of the art in data integration, the Five Level Schema Architecture above does suffer from a major drawback, namely IT imposed look and feel. Modern data users demand control over how data is presented; their needs are somewhat in conflict with such bottom-up approaches to data integration.


See also

*
Enterprise Information Integration Enterprise information integration (EII) is the ability to support an unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to pr ...
(EII) * Data Virtualization * Master data management (MDM) * Schema Matching * Universal relation assumption * Linked Data *
SPARQL SPARQL (pronounced " sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description ...


References


External links


DB2 and Federated Databases


* ttp://www.ibm.com/developerworks/db2/library/techarticle/0307lurie/0307lurie.html Worked example federating Oracle, Informix, DB2, and Excel* Freitas, André, Edward Curry, João Gabriel Oliveira, and Sean O’Riain. 2012
“Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends.”
IEEE Internet Computing 16 (1): 24–33.
IBM Gaian Database: A dynamic Distributed Federated Database

Federated system and methods and mechanisms of implementing and using such a system
{{DEFAULTSORT:Federated Database System Database management systems