Probabilistic Databases
   HOME

TheInfoList



OR:

Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases. A probabilistic database is an
uncertain database In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all ''true ...
in which the possible worlds have associated
probabilities Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
. Probabilistic
database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s are currently an active area of research. "While there are currently no commercial probabilistic database systems, several research prototypes exist..." Probabilistic databases distinguish between the logical data model and the physical representation of the data much like
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s do in the
ANSI-SPARC Architecture The ANSI-SPARC Architecture (American National Standards Institute, Standards Planning And Requirements Committee), is an abstract design standard for a database management system (DBMS), first proposed in 1975.ANSI/X3/SPARC Study Group on Data Ba ...
. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
), succinctly.


Terminology

In a probabilistic database, each tuple is associated with a probability between 0 and 1, with 0 representing that the data is certainly incorrect, and 1 representing that it is certainly correct.


Possible Worlds

A probabilistic database could exist in multiple states. For example, if there is uncertainty about the existence of a tuple in the database, then the database could be in two different states with respect to that tuple—the first state contains the tuple, while the second one does not. Similarly, if an attribute can take one of the values ''x'', ''y'' or ''z'', then the database can be in three different states with respect to that attribute. Each of these ''states'' is called a possible world. Consider the following database: (Here ' denotes that the attribute can take any of the values ''b3'', ''b3′'' or ''b3′′'') *Assuming that there is uncertainty about the first tuple, certainty about the second tuple, and uncertainty about the value of attribute B in the third tuple. Then the actual state of the database may or may not contain the first tuple (depending on whether it is correct or not). Similarly, the value of the attribute B may be ''b3'', ''b3′'' or ''b3′′''. Consequently, the possible worlds corresponding to the database are as follows:


Types of Uncertainties

There are essentially two kinds of uncertainties that could exist in a probabilistic database, as described in the table below: By assigning values to random variables associated with the data items, different possible worlds can be represented.


History

The first published use of the term "probabilistic database" was probably in the 1987 VLDB conference paper "The theory of probabilistic databases", by Cavallo and Pittarelli.Roger Cavallo, Michael Pittarelli: The Theory of Probabilistic Databases. In VLDB'87, Proceedings of 13th International Conference on Very Large Data Bases, September 1-4, 1987, Brighton: 71-81 (1987) The title (of the 11 page paper) was intended as a bit of a joke, since David Maier's 600 page monograph, The Theory of Relational Databases, would have been familiar at that time to most of the conference participants and readers of the conference proceedings.


References


External links

* The MayBMS project at
Cornell University Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to teach an ...

sourceforge.net project site
* Th

project at the
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seattle a ...
* Th
Orion
project at
Purdue University Purdue University is a public land-grant research university in West Lafayette, Indiana, and the flagship campus of the Purdue University system. The university was founded in 1869 after Lafayette businessman John Purdue donated land and money ...
* Th
Trio
project at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
* Th
BayesStore
project at the
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
* Th
PrDB
project at the
University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public land-grant research university in College Park, Maryland. Founded in 1856, UMD is the flagship institution of the University System of Mary ...
* Th
Mimir
project at the University at Buffalo {{databases Database management systems Types of databases Database theory Fuzzy logic