Schema-agnostic Databases
   HOME

TheInfoList



OR:

Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic
semantic matching Semantic matching is a technique used in computer science to identify information which is semantically related. Given any two graph-like structures, e.g. classifications, taxonomies database or XML schemas and ontologies, matching is an operato ...
between queries and
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary. The increase in the size and in the semantic heterogeneity of database schemas bring new requirements for users querying and searching
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the
semantic gap The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. According to Andreas Hein, the semantic gap can be defined as "the difference in meani ...
between users and databases, which becomes more central as the scale and complexity of the data grows.


Description

The evolution of data environments towards the consumption of data from multiple data sources and the growth in the ''schema size'', ''complexity'', ''dynamicity'' and ''decentralisation'' (SCoDD) of schemasA. Freitas
"Schema-agnostic queries over large-schema databases: a distributional semantics approach"
PhD Thesis, 2015
increases the complexity of contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use, which is the typical scenario for Semantic Web Data applications. The evolution of databases in the direction of heterogeneous data environments strongly impacts the usability,
semiotics Semiotics (also called semiotic studies) is the systematic study of sign processes ( semiosis) and meaning making. Semiosis is any activity, conduct, or process that involves signs, where a sign is defined as anything that communicates something ...
and semantic assumptions behind existing data accessibility methods such as structured queries, keyword-based search and visual query systems. With schema-less databases containing potentially millions of dynamically changing attributes, it becomes unfeasible for some users to become aware of the 'schema' or vocabulary in order to query the database. At this scale, the effort in understanding the schema in order to build a structured query can become prohibitive.


Schema-agnostic queries

Schema-agnostic queries can be defined as query approaches over structured databases which allow users satisfying complex information needs without the understanding of the representation (schema) of the database. Similarly, Tran et al. defines it as "search approaches, which do not require users to know the schema underlying the data". Approaches such as keyword-based search over databases allow users to query databases without employing structured queries. However, as discussed by Tran et al.: "From these points, users however have to do further navigation and exploration to address complex information needs. Unlike keyword search used on the Web, which focuses on simple needs, the keyword search elaborated here is used to obtain more complex results. Instead of a single set of resources, the goal is to compute complex sets of resources and their relations." The development of approaches to support natural language interfaces (NLI) over databases have aimed towards the goal of schema-agnostic queries. Complementarily, some approaches based on keyword search have targeted keyword-based queries which express more complex information needs. Other approaches have explored the construction of structured queries over databases where schema constraints can be relaxed. All these approaches (natural language, keyword-based search and structured queries) have targeted different degrees of sophistication in addressing the problem of supporting a flexible semantic matching between queries and data, which vary from the completely absence of the semantic concern to more principled semantic models. While the demand for schema-agnosticism has been an implicit requirement across semantic search and natural language query systems over structured data, it is not sufficiently individuated as a concept and as a necessary requirement for contemporary database management systems. Recent works have started to define and model the semantic aspects involved on schema-agnostic queries.A. Freitas, J. C. Pereira Da Silva, E. Curry
"On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study"
Workshop of the Natural Language Interfaces for the Web of Data (NLIWoD), 13th International Semantic Web Conference (ISWC), Rival del Garda, 2014.
S. Bischof, M. Kroetzsch, A. Polleres, S. Rudolph, Schema-Agnostic Query Rewriting in SPARQL 1.1" In Proceedings of the 13th International Semantic Web Conference. Springer 2014.


Schema-agnostic structured queries

Consist of schema-agnostic queries following the syntax of a structured standard (for example SQL,
SPARQL SPARQL (pronounced " sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description ...
). The syntax and semantics of operators are maintained, while different terminologies are used.


Example 1

SELECT ?y 
which maps to the following
SPARQL SPARQL (pronounced " sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description ...
query in the dataset vocabulary: PREFIX : PREFIX dbpedia2: PREFIX dbpedia: PREFIX skos: PREFIX dbo: SELECT ?y


Example 2

SELECT ?x which maps to the following
SPARQL SPARQL (pronounced " sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description ...
query in the dataset vocabulary: PREFIX rdf: PREFIX : PREFIX dbpedia2: PREFIX dbpedia: SELECT ?x


Schema-agnostic keyword queries

Consist of schema-agnostic queries using keyword queries. In this case the syntax and semantics of operators are different from the structured query syntax.


Example

"Bill Clinton daughter married to"
"Books by William Goldman with more than 300 pages"


Semantic complexity

As of 2016 the concept of schema-agnostic queries has been developed primarily in academia. Most of schema-agnostic query systems have been investigated in the context of Natural Language Interfaces over databases or over the Semantic Web. These works explore the application of
semantic parsing Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Application ...
techniques over large, heterogeneous and schema-less databases. More recently, the individuation of the concept of schema-agnostic query systems and databases have appeared more explicitly within the literature. Freitas et al.A. Freitas, J. E. Sales, S. Handschuh, E. Curry
"How hard is the Query? Measuring the Semantic Complexity of Schema-Agnostic Queries"
In Proceedings of the 11th International Conference on Computational Semantics (IWCS), London, 2015.
provide a probabilistic model on the semantic complexity of mapping schema-agnostic queries.


References

{{reflist Artificial intelligence Computer data Database management systems