Can't extend
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, file system fragmentation, sometimes called file system aging, is the tendency of a file system to lay out the contents of files non-continuously to allow in-place modification of their contents. It is a special case of data fragmentation. File system fragmentation negatively impacts
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
in spinning storage media, which is known to hinder
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
. Fragmentation can be remedied by re-organizing files and free space back into contiguous areas, a process called
defragmentation In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contigu ...
.
Solid-state drives A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It i ...
do not physically seek, so their non-sequential data access is hundreds of times faster than moving drives', making fragmentation a non-issue. It is recommended to not defragment solid-state storage, because this can prematurely wear drives via unnecessary write–erase operations.


Causes

When a file system is first initialized on a
partition Partition may refer to: Computing Hardware * Disk partitioning, the division of a hard disk drive * Memory partition, a subdivision of a computer's memory, usually for use by a single job Software * Partition (database), the division of a ...
, it contains only a few small internal structures and is otherwise one contiguous block of empty space. This means that the file system is able to place newly created files anywhere on the partition. For some time after creation, files can be laid out near-optimally. When the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
and applications are installed or
archive An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or ...
s are unpacked, separate files end up occurring sequentially so related files are positioned close to each other. As existing files are deleted or truncated, new regions of free space are created. When existing files are appended to, it is often impossible to resume the write exactly where the file used to end, as another file may already be allocated there; thus, a new fragment has to be allocated. As time goes on, and the same factors are continuously present, free space as well as frequently appended files tend to fragment more. Shorter regions of free space also mean that the file system is no longer able to allocate new files contiguously, and has to break them into fragments. This is especially true when the file system becomes full and large contiguous regions of free space are unavailable.


Example

The following example is a simplification of an otherwise complicated subject. Consider the following scenario: A new disk has had five files, named A, B, C, D and E, saved continuously and sequentially in that order. Each file is using 10 ''blocks'' of space. (Here, the block size is unimportant.) The remainder of the disk space is one free block. Thus, additional files can be created and saved after the file E. If the file B is deleted, a second region of ten blocks of free space is created, and the disk becomes fragmented. The empty space is simply left there, marked as and available for later use, then used again as needed. The file system ''could'' defragment the disk immediately after a deletion, but doing so would incur a severe performance penalty at unpredictable times. Now, a new file called F, which requires seven blocks of space, can be placed into the first seven blocks of the newly freed space formerly holding the file B, and the three blocks following it will remain available. If another new file called G, which needs only three blocks, is added, it could then occupy the space after F and before C. If subsequently F needs to be expanded, since the space immediately following it is occupied, there are three options for the file system: # Adding a new block somewhere else and indicating that F has a second extent # Moving files in the way of the expansion elsewhere, to allow F to remain contiguous # Moving file F so it can be one contiguous file of the new, larger size The second option is probably impractical for performance reasons, as is the third when the file is very large. The third option is impossible when there is no single contiguous free space large enough to hold the new file. Thus the usual practice is simply to create an ''extent'' somewhere else and chain the new extent onto the old one. Material added to the end of file F would be part of the same extent. But if there is so much material that no room is available after the last extent, then ''another'' extent would have to be created, and so on. Eventually the file system has free segments in many places and some files may be spread over many extents. Access time for those files (or for all files) may become excessively long.


Necessity

Some early file systems were unable to fragment files. One such example was the Acorn DFS file system used on the
BBC Micro The British Broadcasting Corporation Microcomputer System, or BBC Micro, is a series of microcomputers and associated peripherals designed and built by Acorn Computers in the 1980s for the BBC Computer Literacy Project. Designed with an emphas ...
. Due to its inability to fragment files, the error message ''can't extend'' would at times appear, and the user would often be unable to save a file even if the disk had adequate space for it. DFS used a very simple disk structure and files on disk were located only by their length and starting sector. This meant that all files had to exist as a continuous block of sectors and fragmentation was not possible. Using the example in the table above, the attempt to expand file F in step five would have failed on such a system with the ''can't extend'' error message. Regardless of how much free space might remain on the disk in total, it was not available to extend the data file. Standards of
error handling In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
at the time were primitive and in any case programs squeezed into the limited memory of the BBC Micro could rarely afford to waste space attempting to handle errors gracefully. Instead, the user would find themselves dumped back at the command prompt with the ''Can't extend'' message and all the data which had yet to be appended to the file would be lost. The problem could not be solved by simply checking the free space on the disk beforehand, either. While free space on the disk may exist, the size of the largest contiguous block of free space was not immediately apparent without analyzing the numbers presented by the disk catalog and so would escape the user's notice. In addition, almost all DFS users had previously used cassette file storage, which does not suffer from this error. The upgrade to a floppy disk system was an expensive upgrade, and it was a shock that the upgrade might without warning cause
data loss Data loss is an error condition in information systems in which information is destroyed by failures (like failed spindle motors or head crashes on hard drives) or neglect (like mishandling, careless handling or storage under unsuitable conditions) ...
.


Types

File system fragmentation may occur on several levels: * Fragmentation within individual
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
s * Free space fragmentation * The decrease of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
between separate, but related files * Fragmentation within the data structures or special files reserved for the file system itself


File fragmentation

Individual file fragmentation occurs when a single file has been broken into multiple pieces (called extents on extent-based file systems). While disk file systems attempt to keep individual files contiguous, this is not often possible without significant performance penalties. File system check and defragmentation tools typically only account for file fragmentation in their "fragmentation percentage" statistic.


Free space fragmentation

Free (unallocated) space fragmentation occurs when there are several unused areas of the file system where new files or metadata can be written to. Unwanted free space fragmentation is generally caused by deletion or truncation of files, but file systems may also intentionally insert fragments ("bubbles") of free space in order to facilitate extending nearby files (see
preventing fragmentation Prevention may refer to: Health and medicine * Preventive healthcare, measures to prevent diseases or injuries rather than curing them or treating their symptoms General safety * Crime prevention, the attempt to reduce deter crime and crimin ...
below).


File scattering

File segmentation, also called related-file fragmentation, or application-level (file) fragmentation, refers to the lack of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
(within the storing medium) between related files (see
file sequence In computing, as well as in non-computing contexts, a file sequence is a well-ordered, (finite) collection of files, usually related to each other in some way. In computing, file sequences should ideally obey some kind of locality of reference pr ...
for more detail). Unlike the previous two types of fragmentation, file scattering is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second. To avoid related file fragmentation and improve locality of reference (in this case called ''file contiguity''), assumptions or active observations about the operation of applications have to be made. A very frequent assumption made is that it is worthwhile to keep smaller files within a single directory together, and lay them out in the natural file system order. While it is often a reasonable assumption, it does not always hold. For example, an application might read several different files, perhaps in different directories, in exactly the same order they were written. Thus, a file system that simply orders all writes successively, might work faster for the given application.


Data structure fragmentation

The catalogs or indices used by a file system itself can also become fragmented over time, as the entries they contain are created, changed, or deleted. This is more of a concern when the volume contains a multitude of very small files than when a volume is filled with fewer larger files. Depending on the particular file system design, the files or regions containing that data may also become fragmented (as described above for 'regular' files), regardless of any fragmentation of the actual data records maintained within those files or regions. For some file systems (such as
NTFS New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred fil ...
and HFS/
HFS Plus HFS Plus or HFS+ (also known as Mac OS Extended or HFS Extended) is a journaling file system developed by Apple Inc. It replaced the Hierarchical File System (HFS) as the primary file system of Apple computers with the 1998 release of Mac OS ...
), the
collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
/ sorting/ compaction needed to optimize this data cannot easily occur while the file system is in use.


Negative consequences

File system fragmentation is more problematic with consumer-grade
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magne ...
s because of the increasing disparity between
sequential access Sequential access is a term describing a group of elements (such as data in a memory array or a disk file or on magnetic tape data storage) being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to ...
speed and
rotational latency Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
(and to a lesser extent
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
) on which file systems are usually placed. Thus, fragmentation is an important problem in file system research and design. The containment of fragmentation not only depends on the on-disk format of the file system, but also heavily on its implementation. File system fragmentation has less performance impact upon
solid-state drive A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is a ...
s, as there is no mechanical
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
involved. However, the file system needs to store additional metadata for each non-contiguous part of the file. Each piece of metadata itself occupies space and requires processing power and processor time. If the maximum fragmentation limit is reached, write requests fail. In simple file system
benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...
s, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often run on empty file systems. Thus, the results may vary heavily from real-life access patterns.


Mitigation

Several techniques have been developed to fight fragmentation. They can usually be classified into two categories: ''preemptive'' and ''retroactive''. Due to the difficulty of predicting access patterns these techniques are most often
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
in nature and may degrade performance under unexpected workloads.


Preventing fragmentation

Preemptive techniques attempt to keep fragmentation at a minimum at the time data is being written on the disk. The simplest is appending data to an existing fragment in place where possible, instead of allocating new blocks to a new fragment. Many of today's file systems attempt to preallocate longer chunks, or chunks from different free space fragments, called extents to files that are actively appended to. This largely avoids file fragmentation when several files are concurrently being appended to, thus avoiding their becoming excessively intertwined. If the final size of a file subject to modification is known, storage for the entire file may be preallocated. For example, the Microsoft Windows
swap file Swap or SWAP may refer to: Finance * Swap (finance), a derivative in which two parties agree to exchange one stream of cash flows against another * Barter Science and technology * Swap (computer programming), exchanging two variables in the ...
(page file) can be resized dynamically under normal operation, and therefore can become highly fragmented. This can be prevented by specifying a page file with the same minimum and maximum sizes, effectively preallocating the entire file. BitTorrent and other
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
filesharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include re ...
applications limit fragmentation by preallocating the full space needed for a file when initiating
download In computer networks, download means to ''receive'' data from a remote system, typically a server such as a web server, an FTP server, an email server, or other similar system. This contrasts with uploading, where data is ''sent to'' a remote ...
s. A relatively recent technique is
delayed allocation Allocate-on-flush (also called delayed allocation) is a file system feature implemented in HFS+, XFS, Reiser4, ZFS, Btrfs, and ext4. The feature also closely resembles an older technique that Berkeley's UFS called "block reallocation". When blo ...
in
XFS XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as ...
, HFS+ and
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...
; the same technique is also called allocate-on-flush in
reiser4 Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire. Reiser4 was named after its former lead developer Hans Reiser. , the Reiser4 patch set is ...
and
ext4 ext4 (fourth extended filesystem) is a journaling file system for Linux, developed as the successor to ext3. ext4 was initially a series of backward-compatible extensions to ext3, many of them originally developed by Cluster File Systems for ...
. When the file system is being written to, file system blocks are reserved, but the locations of specific files are not laid down yet. Later, when the file system is forced to flush changes as a result of memory pressure or a transaction commit, the allocator will have much better knowledge of the files' characteristics. Most file systems with this approach try to flush files in a single directory contiguously. Assuming that multiple reads from a single directory are common, locality of reference is improved. Reiser4 also orders the layout of files according to the directory
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
, so that when files are being accessed in the natural file system order (as dictated by readdir), they are always read sequentially.


Defragmentation

Retroactive techniques attempt to reduce fragmentation, or the negative effects of fragmentation, after it has occurred. Many file systems provide
defragmentation In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contigu ...
tools, which attempt to reorder fragments of files, and sometimes also decrease their scattering (i.e. improve their contiguity, or
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
) by keeping either smaller files in
directories Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's ...
, or directory trees, or even
file sequence In computing, as well as in non-computing contexts, a file sequence is a well-ordered, (finite) collection of files, usually related to each other in some way. In computing, file sequences should ideally obey some kind of locality of reference pr ...
s close to each other on the disk. The
HFS Plus HFS Plus or HFS+ (also known as Mac OS Extended or HFS Extended) is a journaling file system developed by Apple Inc. It replaced the Hierarchical File System (HFS) as the primary file system of Apple computers with the 1998 release of Mac OS ...
file system transparently defragments files that are less than 20 MiB in size and are broken into 8 or more fragments, when the file is being opened. The now obsolete Commodore Amiga
Smart File System The Smart File System (SFS) is a journaling filesystem used on Amiga computers and AmigaOS-derived operating systems (though some support also exists for IBM PC compatibles). It is designed for performance, scalability and integrity, offering i ...
(SFS) defragmented itself while the filesystem was in use. The defragmentation process is almost completely stateless (apart from the location it is working on), so that it can be stopped and started instantly. During defragmentation data integrity is ensured for both metadata and normal data.


See also

*
List of defragmentation software __NOTOC__ The following is a comparison of notable file system defragmentation software: Notes References {{Reflist, 30em External links The Big Windows 7 Defragmenter TestThe Big Windows XP Defragmenter Test Defragmentation softw ...
* FAT file fragmentation * Disk compression


Notes


References


Further reading

* {{DEFAULTSORT:File System Fragmentation File system management