Semantic file systems are
file systems used for information persistence which structure the data according to their
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
and intent, rather than the location as with current file systems. It allows the data to be addressed by their content (associative access). Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.
Semantic file systems raise technical design challenges as indexes of words, tags or elementary signs of some sort have to be created and constantly updated, maintained and cached for performance to offer the desired random, multi-variate access to files in addition to the underlying, mostly traditional block-based filesystem.
A semantic file system can be envisioned as a part of a
semantic desktop
In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different application software, applications or tas ...
.
History
The notion of semantic file system was proposed in 1991 by researchers of the
MIT
The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
and
École des Mines de Paris.
They proposed an integrated system whose main query interface looked like a traditional file system interface via a virtual directory system that interpreted a path as a
conjunctive query In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational ...
. Their implementation had automatic extraction of the relevant
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
via what they called
file type
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or open.
Some file formats ...
specific transducers.
Starting in around 2004, a new wave of implementations centered on manual tagging of files and folders.
In 2008, researchers proposed to integrate semantic file systems with
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
technologies.
Types of metadata
Tags
Tags can be used instead of folders to circumvent the limits of a hierarchical model.
File type-specific
Gifford et al.
suggested the idea of
file type
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or open.
Some file formats ...
-specific metadata automatically extracted by a file-type specific transducer.
For instance, for a source code text file, metadata could include the names of the procedures that the program exports or imports, procedure types, and the files included by the program. For a document, its date, author, title and structure (sections and subsections). For an e-mail, its sender, recipient and subject.
Lineage
In scientific workflows, provenance of a data file is important. A scientist might want to select a results file by filtering by the input dataset.
Architecture
Vasudevan and Pazandak
introduce the distinction between integrated and augmented approaches:
* In integrated approaches, semantics are a feature of the file system.
** Tightly coupled systems are implemented within a file system
** Loosely coupled systems are implemented on top of a classical file system, but hide its interface.
* In augmented approaches, semantics are an abstraction on top of a classical file system. Access to the classical file system interface is maintained, the user can choose.
They suggest
Open systems architecture as being well adapted to semantic file system implementations.
Compatibility with hierarchical file systems
Even integrated semantic file systems may choose to expose an interface for compatibility with existing local or
distributed file system
A clustered file system (CFS) is a file system which is shared by being simultaneously Mount (computing), mounted on multiple Server (computing), servers. There are several approaches to computer cluster, clustering, most of which do not emplo ...
protocols. For instance, Gifford et al.’s 1991 implementation was fully compatible with
NFS.
Metadata storage
Extended file attributes provided by the file system can be a way to store the metadata.
A
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
is another very frequent way to store the metadata.
Research implementations
Implementations
See also
*
Content-addressable storage The content-addressable network (CAN) is a distributed, decentralized Peer-to-peer, P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN was one of the original four distributed hash table proposals, introduced c ...
*
Logic File System
*
Semantic desktop
In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different application software, applications or tas ...
*
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
References
External links
Research & Specifications
The Sile Model. A Semantic File System Infrastructure for the DesktopSemantic FS @ MIT Programming Systems Research GroupLaunchpad Blueprints: A tag-based filesystem for Ubuntuexternal list of related work on semantic file systems @ semanticweb.org"Designing better file organization around tags, not hierarchies" detailed writeup by NayukiNon-Directory Filesystem{{Computer files
Semantic file systems