Block Suballocation
   HOME

TheInfoList



OR:

Block suballocation is a feature of some computer
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s which allows large blocks or allocation units to be used while making efficient use of empty space at the end of large files, space which would otherwise be lost for other use to
internal fragmentation In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific ...
. In file systems that don't support fragments, this feature is also called tail merging or tail packing because it is commonly done by packing the "tail", or last partial block, of multiple files into a single block.


Rationale

File systems have traditionally divided the disk into equally sized blocks to simplify their design and limit the worst-case fragmentation. Block sizes are typically multiples of 512 bytes due to the size of hard
disk sector In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and D ...
s. When files are allocated by some traditional file systems, only whole blocks can be allocated to individual files. But as file sizes are often not multiples of the file system block size, this design inherently results in the last blocks of files (called tails) occupying only a part of the block, resulting in what is called
internal fragmentation In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific ...
(not to be confused with external fragmentation). This waste of space can be significant if the file system stores many small files and can become critical when attempting to use higher block sizes to improve performance. FFS and other derived UNIX file systems support fragments which greatly mitigate this effect.


Suballocation schemes

Block suballocation addresses this problem by dividing up a tail block in some way to allow it to store fragments from other files. Some block suballocation schemes can perform allocation at the byte level; most, however, simply divide up the block into smaller ones (the divisor usually being some power of 2). For example, if a 38
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
file is to be stored in a
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
using 32 KiB blocks, the file would normally span two blocks, or 64 KiB, for storage; the remaining 26 KiB of the second block becomes unused slack space. With an 8 KiB block suballocation, however, the file would occupy just 6 KiB of the second block, leave 2 KiB (of the 8 KiB suballocation block) slack and free the other 24 KiB of the block for other files.


Tail packing

Some file systems have since been designed to take advantage of this unused space, and can pack the tails of several files in a single shared tail block. While this may, at first, seem like it would significantly increase file system fragmentation, the negative effect can be mitigated with
readahead Readahead is a system call of the Linux kernel that loads a file's contents into the page cache. This prefetches the file so that when it is subsequently accessed, its contents are read from the main memory (RAM) rather than from a hard disk drive ...
features on modern
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s – when dealing with short files, several tails may be close enough to each another to be read together, and thus a
disk seek Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
is not introduced. Such file systems often employ
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...
s in order to determine whether tail packing is worthwhile in a given situation, and
defragmentation In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contigu ...
software may use a more evolved heuristic.


Efficiency

In some scenarios where the majority of files are shorter than half the block size, such as in a folder of small
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
files or small bitmap images, tail packing can increase storage efficiency even more than twofold, compared to file systems without tail packing. This not only translates into conservation of disk space, but may also introduce performance increases, as due to higher
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
, less data has to be read, also translating into higher
page cache In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache ...
efficiency. However, these advantages can be negated by the increased complexity of
implementation Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy. Industry-specific definitions Computer science In computer science, an implementation is a real ...
. , the most widely used read-write file systems with support for block suballocation are
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
and FreeBSD UFS2 (where it is called " block level fragmentation"). Once popular,
ReiserFS ReiserFS is a general-purpose, journaling file system initially designed and implemented by a team at Namesys led by Hans Reiser and licensed under GPLv2. Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file sys ...
and
Reiser4 Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire. Reiser4 was named after its former lead developer Hans Reiser. , the Reiser4 patch set is ...
are no longer common. Several read-only file systems do not use blocks at all and are thus implicitly using space as efficiently as suballocating file systems; such file systems double as
archive formats In computing, an archive file is a computer file that is composed of one or more files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compre ...
.


See also

*
File system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
*
Internal fragmentation In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific ...
*
Locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
*
Comparison of file systems The following tables compare general and technical information for a number of file systems. General information Limits Metadata Features File capabilities Block capabilities Note that in addition to the below table, blo ...


References

* {{refend Computer file systems