Enterprise Information Integration
   HOME

TheInfoList



OR:

Enterprise information integration (EII) is the ability to support an unified view of data and information for an entire organization. In a
data virtualization Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can pr ...
application of EII, a process of information integration, using
data abstraction In software engineering and computer science, abstraction is: * The process of removing or generalizing physical, spatial, or temporal details or attributes in the study of objects or systems to focus attention on details of greater importance ...
to provide a unified interface (known as uniform data access) for viewing all the data within an organization, and a single set of structures and naming conventions (known as uniform information representation) to represent this data; the goal of EII is to get a large set of
heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
data sources to appear to a user or system as a single, homogeneous data source.


Overview

Data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
within an
enterprise Enterprise (or the archaic spelling Enterprize) may refer to: Business and economics Brands and enterprises * Enterprise GP Holdings, an energy holding company * Enterprise plc, a UK civil engineering and maintenance company * Enterprise ...
can be stored in heterogeneous formats, including
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
s (which themselves come in a large number of varieties), text files,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
files,
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ...
s and a variety of proprietary storage methods, each with their own
index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...
ing and data access methods. Standardized data access APIs have emerged that offer a specific set of commands to retrieve and modify data from a generic data source. Many applications exist that implement these APIs' commands across various data sources, most notably relational databases. Such APIs include
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
,
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
, XQJ, OLE DB, and more recently ADO.NET. There are also standard formats for representing data within a file that are very important to information integration. The best-known of these is XML, which has emerged as a standard universal representation format. There are also more specific XML "grammars" defined for specific types of data such as
Geography Markup Language The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographi ...
for expressing geographical features and
Directory Service Markup Language Directory Services Markup Language (DSML) is a representation of directory service information in an XML syntax. The DSML version 1 effort was announced on July 12, 1999 by creator Bowstreet (subsequently acquired by IBM in 2005). Initiative s ...
for holding directory-style information. In addition, non-XML standard formats exist such as
iCalendar The Internet Calendaring and Scheduling Core Object Specification (iCalendar) is a media type which allows users to store and exchange calendaring and scheduling information such as events, to-dos, journal entries, and free/busy information, a ...
for representing calendar information and
vCard vCard, also known as VCF (Virtual Contact File), is a file format standard for electronic business cards. vCards can be attached to e-mail messages, sent via Multimedia Messaging Service (MMS), on the World Wide Web, instant messaging, NFC ...
for
business card Business cards are cards bearing business information about a company or individual. They are shared during formal introductions as a convenience and a memory aid. A business card typically includes the giver's name, company or business af ...
information. Enterprise Information Integration (EII) applies
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
commercially. Despite the theoretical problems described above, the private sector shows more concern with the problems of data integration as a viable product. EII emphasizes neither on correctness nor tractability, but speed and simplicity. Practitioners cite the following major issues which EII must address for the industry to become mature: ; Combining disparate data sets : Each data source is disparate and as such is not designed to support EII. Therefore, data virtualization as well as
data federation In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
depends upon accidental data commonality to support combining data and information from disparate data sets. Because of this lack of data value commonality across data sources, the return set may be inaccurate, incomplete, and impossible to validate. : One solution is to recast disparate databases to integrate these databases without the need for ETL. The recast databases support commonality constraints where referential integrity may be enforced between databases. The recast databases provide designed data access paths with data value commonality across databases. ; Simplicity of understanding : Answering queries with views arouses interest from a theoretical standpoint, but difficulties in understanding how to incorporate it as an "enterprise solution". ; Simplicity of deployment : Even if recognized as a solution to a problem, EII currently takes time to apply and offers complexities in deployment. Proposed schema-less solutions include "Lean Middleware", but ease-of-use and speed of employment appear inversely proportional to the generality of such systems. ; Handling higher-order information : Analysts experience difficulty—even with a functioning information integration system—in determining whether the sources in the database will satisfy a given application. Answering these kinds of questions about a set of repositories requires semantic information like
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
and/or ontologies.


Applications

EII products enable
loose coupling In computing and systems design, a loosely coupled system is one # in which components are weakly associated (have breakable relationships) with each other, and thus changes in one component least affect existence or performance of another comp ...
between
homogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (spreadsheets,
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current ...
s, presentation software, etc.), development environments and
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
s (
Java EE Jakarta EE, formerly Java Platform, Enterprise Edition (Java EE) and Java 2 Platform, Enterprise Edition (J2EE), is a set of specifications, extending Java SE with specifications for enterprise features such as distributed computing and web ser ...
, .NET,
Mono Mono may refer to: Common meanings * Infectious mononucleosis, "the kissing disease" * Monaural, monophonic sound reproduction, often shortened to mono * Mono-, a numerical prefix representing anything single Music Performers * Mono (Japanes ...
,
SOAP Soap is a salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, soaps are us ...
or
REST Rest or REST may refer to: Relief from activity * Sleep ** Bed rest * Kneeling * Lying (position) * Sitting * Squatting position Structural support * Structural support ** Rest (cue sports) ** Armrest ** Headrest ** Footrest Arts and enter ...
ful Web services, etc.),
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical ...
(BI),
business activity monitoring Business activity monitoring (BAM) is software that aids the monitoring of business activities which are implemented in computer systems. The term was originally coined by analysts at Gartner, Inc. and refers to the aggregation, analysis, and p ...
(BAM) software,
enterprise resource planning Enterprise resource planning (ERP) is the integrated management of main business processes, often in real time and mediated by software and technology. ERP is usually referred to as a category of business management software—typically a sui ...
(ERP),
Customer relationship management Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information. CRM systems compile data from a r ...
(CRM),
business process management Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business p ...
(BPM and/or BPEL) Software, and web content management (CMS).


Data access technologies

*
Service Data Objects Service Data Objects is a technology that allows heterogeneous data to be accessed in a uniform way. The SDO specification was originally developed in 2004 as a joint collaboration between Oracle ( BEA) and IBM and approved by the Java Community ...
(SDO) for Java, C++ and .Net clients and any type of data source *
XQuery XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, b ...
and XQuery API for Java


See also

*
Business Intelligence 2.0 Business Intelligence 2.0 (BI 2.0) is a development of the existing business intelligence model that began in the mid-2000s, where data can be obtained from many sources. The process allows for the querying of real-time corporate data by employees, ...
(BI 2.0) *
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
*
Disparate system In information technology, a disparate system or a disparate data system is a computer data processing system that was designed to operate as a fundamentally distinct data processing system without exchanging data or interacting with other compute ...
*
Enterprise integration Enterprise integration is a technical field of enterprise architecture, which is focused on the study of topics such as system interconnection, electronic data interchange, product data exchange and distributed computing environments. It is a c ...
*
Federated database system A federated database system (FDBS) is a type of meta- database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer ...
*
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
*
Semantic heterogeneity Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogen ...
*
Semantic integration Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information (physical, psychological, and social), documents of all sorts, contacts (including ...
* Semantic Web *
Web 2.0 Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture and interoperability (i.e., compatibility with other products, systems, and ...
* Web services {{div col end


References

Data management