HOME

TheInfoList



OR:

An HTree is a specialized
tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...
for directory indexing, similar to a
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the
filename A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name &ndas ...
, and do not require balancing. The HTree algorithm is distinguished from standard B-tree methods by its treatment of
hash collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. Al ...
s, which may overflow across multiple leaf and index blocks. HTree
index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...
es are used in the
ext3 ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extend ...
and
ext4 ext4 (fourth extended filesystem) is a journaling file system for Linux, developed as the successor to ext3. ext4 was initially a series of backward-compatible extensions to ext3, many of them originally developed by Cluster File Systems for ...
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s, and were incorporated into the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
around 2.5.40. HTree indexing improved the scalability of
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
ext2 based filesystems from a practical limit of a few thousand files, into the range of tens of millions of files per directory.


History

The HTree index data structure and algorithm were developed by Daniel Phillips in 2000 and implemented for the ext2 filesystem in February 2001. A port to the ext3 filesystem by Christopher Li and Andrew Morton in 2002 during the 2.5
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
series added
journal A journal, from the Old French ''journal'' (meaning "daily"), may refer to: *Bullet journal, a method of personal organization *Diary, a record of what happened over the course of a day or other period *Daybook, also known as a general journal, a ...
based crash consistency. With minor improvements, HTree continues to be used in ext4 in the Linux 3.x.x kernel series.


Use

*
ext2 The ext2 or second extended file system is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same pr ...
HTree indexes were originally developed for ext2 but the patch never made it to the official branch. The dir_index feature can be enabled when creating an ext2 filesystem, but the ext2 code won't act on it. *
ext3 ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extend ...
HTree indexes are available in ext3 when the dir_index feature is enabled. *
ext4 ext4 (fourth extended filesystem) is a journaling file system for Linux, developed as the successor to ext3. ext4 was initially a series of backward-compatible extensions to ext3, many of them originally developed by Cluster File Systems for ...
HTree indexes are turned on by default in ext4. This feature is implemented in Linux kernel 2.6.23. HTree indexes is also used for file
extents In computing, an extent is a contiguous area of storage reserved for a file in a file system, represented as a range of block numbers, or tracks on count key data devices. A file can consist of zero or more extents; one file fragment requires on ...
when a file needs more than the 4 extents stored in the
inode The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribute ...
. The large_dir feature of ext4 is implemented in Linux kernel 4.13.


PHTree

PHTree (Physically stable HTree) is a derivation intended as a successor. It fixes all the known issues with HTree except for write multiplication. It is used in the
Tux3 Tux3 is an open-source versioning filesystem created by Daniel Phillips. He introduced the filesystem as a public replacement for his Tux2 filesystem which had encountered licensing issues due to the filing of several patents. Phillips had previo ...
filesystem.


References


External links


A Directory Index for Ext2 (which describes the HTree data structure)HTreeHPDD Wiki - Parallel Directory High Level Design
{{CS-Trees Disk file systems B-tree Linux