HOME

TheInfoList



OR:

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. According to Andreas Hein, the semantic gap can be defined as "the difference in meaning between constructs formed within different representation systems". In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation. More precisely the gap means the difference between ambiguous formulation of contextual knowledge in a powerful language (e.g.
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
) and its sound, reproducible and computational representation in a
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sym ...
(e.g.
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
).
Semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
of an object depends on the context it is regarded within. For practical application this means any formal representation of real world tasks requires the translation of the contextual expert knowledge of an application (high-level) into the elementary and reproducible operations of a computing machine (low-level). Since natural language allows the expression of tasks which are impossible to compute in a formal language there are no means to automate this translation in a general way. Moreover, the examination of languages within the Chomsky hierarchy indicates that there is no formal and consequently automated way of translating from one language into another above a certain level of expressional power.


Theoretical background

The yet unproven but commonly accepted Church-Turing thesis states that a
Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer alg ...
and all equivalent formal languages such as the
lambda calculus Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation th ...
perform and represent all formal operations respectively as applied by a computing human. However the selection of adequate operations for the correct computation itself is not formally deducible, moreover it depends on the computability of the underlying problem. Tasks, such as the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
, may be formulated comprehensively in natural language, but the computational representation will not terminate or does not provide a usable result, which is proven by Rice's theorem. The general expression of limitations for rule based deduction by Gödel's
incompleteness theorem Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies ...
indicates that the semantic gap is never to be fully closed. These are general statements, considering the generalized limits of computation on the highest level of abstraction where the ''semantic gap'' manifests itself. There are however many subsets of problems which may be translated automatically, especially in the higher-numbered levels of the Chomsky hierarchy.


Formal languages

Real world tasks are formalized by programming languages, which are executed on computers based on the
von Neumann architecture The von Neumann architecture — also known as the von Neumann model or Princeton architecture — is a computer architecture based on a 1945 description by John von Neumann, and by others, in the '' First Draft of a Report on the EDVAC''. T ...
. Since programming languages are only comfortable representations of the Turing machine any program on a von Neumann computer has the same properties and limitations as the Turing machine or its equivalent representation. Consequently, every programming language such as CPU level machine code, assembler, or any high level programming language has the same expressional power as the underlying Turing machine is able to compute. There is no ''semantic gap'' between them since a program is transferred from the high level language to the machine code by a program, e.g. a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
which itself runs on a Turing machine without any user interaction. The semantic gap actually opens between the selection of the rules and the representation of the task.


Practical consequences

Selection of rules for formal representations of real world applications, corresponds to writing a program. Writing programs is independent from the actual programming language and basically requires the translation of the domain specific knowledge of the user into the formal rules operating a turing machine. It is this transfer from contextual knowledge into formal representation which cannot be automatized with respect to the theoretical limitations of computation. Consequently, any mapping from real world applications into computer applications requires a certain amount of technical background knowledge by the user, where the ''semantic gap'' manifests itself. It is a fundamental task of
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
to close the gap between application specific knowledge and technically doable formalization. For this purpose domain specific (high-level) knowledge must be transferred into an algorithm and its parameters (low-level). This requires the dialogue between user and developer. Aim is always a software which allows the user to represent his knowledge as parameters of an algorithm without knowing the details of the implementation, and to interpret the outcome of the algorithm without the aid of the developer. For this purpose
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
s play the key role in software design, while developers are supported by
frameworks A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
which help organizing the integration of contextual information.


Examples


Document retrieval

A simple example can be formulated as a series of increasingly difficult
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
queries to locate a ''target document'' that may or may not exist locally on a known computer system. Example queries: * 1) Locate any file in the known directory "/usr/local/funny". * 2) Locate any file where the word "funny" appears in the filename. * 3) Locate any
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operat ...
where the word "funny" or the substring "humor" appears in the text. * 4) Locate any
mp3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
file where either "funny", "comic" or "humor" appears in the metadata. * 5) Locate any file of any type related to humor. * 6) Locate any image that is likely to make my grandmother laugh. The progressive difficulty of these queries is represented by the increasing degree of abstraction from the types and semantics defined the system architecture (directories and files on a known computer) to the types and semantics that occupy the realm of ordinary human discourse (subjects such as "humor" and entities such as "my grandmother"). Moreover, this disparity of realms is further complicated by
leaky abstraction In software development, a leaky abstraction is an abstraction that leaks details that it is supposed to abstract away. As coined by Joel Spolsky, the Law of Leaky Abstractions states: This statement highlights a particularly problematic cause o ...
s, such as is common in the case of query 4), where the ''target document'' may exist, but may not encapsulate the "metadata" in a manner expected by the user, nor the designer of the query processing system.


Image analysis

Image analysis is a typical domain for which a high degree of abstraction from low-level methods is required, and where the ''semantic gap'' immediately affects the user. If image content is to be identified to understand the meaning of an image, the only available independent information is the low-level pixel data. Textual annotations always depend on the knowledge, capability of expression and specific language of the annotator and therefore is unreliable. To recognize the displayed scenes from the raw data of an image the algorithms for selection and manipulation of pixels must be combined and parameterized in an adequate manner and finally linked with the natural description. Even the simple linguistic representation of shape or color such as round or yellow requires entirely different mathematical formalization methods, which are neither intuitive nor unique and sound.


Layered systems

In many
layered system In telecommunication, a layered system is a system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described b ...
s, some conflicts arise when concepts at a high level of abstraction need to be translated into lower, more concrete artifacts. This mismatch is often called ''semantic gap''.


Databases

OODBMS An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which ar ...
s (object-oriented database management system) advocates sometimes claim that these databases help to reduce the semantic gap between the application domain ( miniworld) and the traditional RDBMS systems. However Relational proponents would posit the exact opposite, because by definition
object database An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which a ...
s fix the data being recorded into a single binding abstraction.


See also

*
Leaky abstraction In software development, a leaky abstraction is an abstraction that leaks details that it is supposed to abstract away. As coined by Joel Spolsky, the Law of Leaky Abstractions states: This statement highlights a particularly problematic cause o ...
*
Text simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and ...
*
Semantic differential The semantic differential (SD) is a measurement scale designed to measure a person's subjective perception of, and affective reactions to, the properties of concepts, objects, and events by making use of a set of bipolar scales. The SD is used to a ...


References

{{DEFAULTSORT:Semantic Gap Abstraction