HOME

TheInfoList



OR:

Semantic file systems are file systems used for information persistence which structure the data according to their
semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
and intent, rather than the location as with current file systems. It allows the data to be addressed by their content (associative access). Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion. Semantic file systems raise technical design challenges as indexes of words, tags or elementary signs of some sort have to be created and constantly updated, maintained and cached for performance to offer the desired random, multi-variate access to files in addition to the underlying, mostly traditional block-based filesystem. A semantic file system can be envisioned as a part of a
semantic desktop In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different application software, applications or tas ...
.


History

The notion of semantic file system was proposed in 1991 by researchers of the
MIT The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
and École des Mines de Paris. They proposed an integrated system whose main query interface looked like a traditional file system interface via a virtual directory system that interpreted a path as a
conjunctive query In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational ...
. Their implementation had automatic extraction of the relevant
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
via what they called
file type A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or open. Some file formats ...
specific transducers. Starting in around 2004, a new wave of implementations centered on manual tagging of files and folders. In 2008, researchers proposed to integrate semantic file systems with
Semantic Web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...
technologies.


Types of metadata


Tags

Tags can be used instead of folders to circumvent the limits of a hierarchical model.


File type-specific

Gifford et al. suggested the idea of
file type A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or open. Some file formats ...
-specific metadata automatically extracted by a file-type specific transducer. For instance, for a source code text file, metadata could include the names of the procedures that the program exports or imports, procedure types, and the files included by the program. For a document, its date, author, title and structure (sections and subsections). For an e-mail, its sender, recipient and subject.


Lineage

In scientific workflows, provenance of a data file is important. A scientist might want to select a results file by filtering by the input dataset.


Architecture

Vasudevan and Pazandak introduce the distinction between integrated and augmented approaches: * In integrated approaches, semantics are a feature of the file system. ** Tightly coupled systems are implemented within a file system ** Loosely coupled systems are implemented on top of a classical file system, but hide its interface. * In augmented approaches, semantics are an abstraction on top of a classical file system. Access to the classical file system interface is maintained, the user can choose. They suggest Open systems architecture as being well adapted to semantic file system implementations.


Compatibility with hierarchical file systems

Even integrated semantic file systems may choose to expose an interface for compatibility with existing local or
distributed file system A clustered file system (CFS) is a file system which is shared by being simultaneously Mount (computing), mounted on multiple Server (computing), servers. There are several approaches to computer cluster, clustering, most of which do not emplo ...
protocols. For instance, Gifford et al.’s 1991 implementation was fully compatible with NFS.


Metadata storage

Extended file attributes provided by the file system can be a way to store the metadata. A
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
is another very frequent way to store the metadata.


Research implementations


Implementations


See also

*
Content-addressable storage The content-addressable network (CAN) is a distributed, decentralized Peer-to-peer, P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN was one of the original four distributed hash table proposals, introduced c ...
* Logic File System *
Semantic desktop In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different application software, applications or tas ...
*
Semantic Web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...


References


External links

Research & Specifications
The Sile Model. A Semantic File System Infrastructure for the Desktop

Semantic FS @ MIT Programming Systems Research Group

Launchpad Blueprints: A tag-based filesystem for Ubuntu



external list of related work on semantic file systems @ semanticweb.org

"Designing better file organization around tags, not hierarchies" detailed writeup by Nayuki

Non-Directory Filesystem
{{Computer files Semantic file systems