HOME

TheInfoList



OR:

ext4 (fourth extended filesystem) is a
journaling file system A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the goal of such changes in a data structure known as a " journal", which is usually a circular log. In the even ...
for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
, developed as the successor to
ext3 ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on ext ...
. ext4 was initially a series of
backward-compatible Backward compatibility (sometimes known as backwards compatibility) is a property of an operating system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system, especially in ...
extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to extend storage limits and add other performance improvements. However, other
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ...
developers opposed accepting extensions to ext3 for stability reasons, and proposed to
fork In cutlery or kitchenware, a fork (from la, furca 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tine (structural), tines with which one ...
the source code of ext3, rename it as ext4, and perform all the development there, without affecting existing ext3 users. This proposal was accepted, and on 28 June 2006,
Theodore Ts'o Theodore (Ted) Yue Tak Ts'o (曹子德) (born 1968) is an American software engineer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems. He is the Secondary developer and maintainer of e2f ...
, the ext3 maintainer, announced the new plan of development for ext4. A preliminary development version of ext4 was included in version 2.6.19 of the Linux kernel. On 11 October 2008, the patches that mark ext4 as stable code were merged in the Linux 2.6.28 source code repositories, denoting the end of the development phase and recommending ext4 adoption. Kernel 2.6.28, containing the ext4 filesystem, was finally released on 25 December 2008. On 15 January 2010,
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
announced that it would upgrade its storage infrastructure from ext2 to ext4. On 14 December 2010, Google also announced it would use ext4, instead of YAFFS, on Android 2.3.


Adoption

ext4 is the default file system for many Linux distributions including
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
and
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All ...
.


Features

; Large file system : The ext4 filesystem can support volumes with sizes in theory up to 64 zebibyte (ZiB) and single files with sizes up to 16 tebibytes (TiB) with the standard 4
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
block size, and volumes with sizes up to 1 yobibyte (YiB) with 64
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
clusters, though a limitation in the extent format makes 1 exbibyte (ZiB) the practical limit. The maximum file, directory, and filesystem size limits grow at least proportionately with the filesystem block size up to the maximum 64 KiB block size available on ARM and
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
/
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
CPUs. ; Extents : Extents replace the traditional block mapping scheme used by ext2 and ext3. An extent is a range of contiguous physical blocks, improving large-file performance and reducing fragmentation. A single extent in ext4 can map up to 128 
MiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
of contiguous space with a 4 KiB block size. There can be four extents stored directly in the
inode The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribut ...
. When there are more than four extents to a file, the rest of the extents are indexed in a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
. ; Backward compatibility : ext4 is
backward-compatible Backward compatibility (sometimes known as backwards compatibility) is a property of an operating system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system, especially in ...
with
ext3 ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on ext ...
and ext2, making it possible to
mount Mount is often used as part of the name of specific mountains, e.g. Mount Everest. Mount or Mounts may also refer to: Places * Mount, Cornwall, a village in Warleggan parish, England * Mount, Perranzabuloe, a hamlet in Perranzabuloe parish, ...
ext3 and ext2 as ext4. This will slightly improve performance, because certain new features of the ext4 implementation can also be used with ext3 and ext2, such as the new block allocation algorithm, without affecting the on-disk format. : ext3 is partially forward-compatible with ext4. Practically, ext4 will not mount as an ext3 filesystem out of the box, unless certain new features are disabled when creating it, such as ^extent, ^flex_bg, ^huge_file, ^uninit_bg, ^dir_nlink, and ^extra_isize. ; Persistent pre-allocation : ext4 can pre-allocate on-disk space for a file. To do this on most file systems, zeroes would be written to the file when created. In ext4 (and some other file systems such as
XFS XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as ...
) fallocate(), a new system call in the Linux kernel, can be used. The allocated space would be guaranteed and likely contiguous. This situation has applications for media streaming and databases. ; Delayed allocation : ext4 uses a performance technique called allocate-on-flush, also known as ''delayed allocation''. That is, ext4 delays block allocation until data is flushed to disk; in contrast, some file systems allocate blocks immediately, even when the data goes into a write cache. Delayed allocation improves performance and reduces fragmentation by effectively allocating larger amounts of data at a time. ; Unlimited number of subdirectories : ext4 does not limit the number of subdirectories in a single directory, except by the inherent size limit of the directory itself. (In ext3 a directory can have at most 32,000 subdirectories.) To allow for larger directories and continued performance, ext4 in Linux 2.6.23 and later turns on HTree indices (a specialized version of a
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
) by default, which allows directories up to approximately 10–12 million entries to be stored in the 2-level HTree index and 2 GB directory size limit for 4 KiB block size, depending on the filename length. In Linux 4.12 and later the largedir feature enabled a 3-level HTree and directory sizes over 2 GB, allowing approximately 6 billion entries in a single directory. ; Journal checksums : ext4 uses
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s in the journal to improve reliability, since the journal is one of the most used files of the disk. This feature has a side benefit: it can safely avoid a disk I/O wait during journaling, improving performance slightly. Journal checksumming was inspired by a research article from the
University of Wisconsin A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. Universities typically offer both undergraduate and postgraduate programs. In the United Stat ...
, titled ''IRON File Systems'' (specifically, section 6, called "transaction checksums"), with modifications within the implementation of compound transactions performed by the IRON file system (originally proposed by Sam Naghshineh in the RedHat summit). ; Metadata checksumming : Since Linux kernel 3.5 released in 2012 ; Faster file-system checking : In ext4 unallocated block groups and sections of the inode table are marked as such. This enables e2fsck to skip them entirely and greatly reduces the time it takes to check the file system. Linux 2.6.24 implements this feature. ; Multiblock allocator : When ext3 appends to a file, it calls the block allocator, once for each block. Consequently, if there are multiple concurrent writers, files can easily become fragmented on disk. However, ext4 uses delayed allocation, which allows it to buffer data and allocate groups of blocks. Consequently, the multiblock allocator can make better choices about allocating files contiguously on disk. The multiblock allocator can also be used when files are opened in O_DIRECT mode. This feature does not affect the disk format. ; Improved timestamps : As computers become faster in general, and as Linux becomes used more for
mission-critical A mission critical factor of a system is any factor (component, equipment, personnel, process, procedure, software, etc.) that is essential to business operation or to an organization. Failure or disruption of mission critical factors will resu ...
applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides
timestamp A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
s measured in
nanoseconds A nanosecond (ns) is a unit of time in the International System of Units (SI) equal to one billionth of a second, that is, of a second, or 10 seconds. The term combines the SI prefix ''nano-'' indicating a 1 billionth submultiple of an SI unit ( ...
. In addition, 2 bits of the expanded timestamp field are added to the most significant bits of the seconds field of the timestamps to defer the year 2038 problem for an additional 408 years. : ext4 also adds support for time-of-creation timestamps. But, as
Theodore Ts'o Theodore (Ted) Yue Tak Ts'o (曹子德) (born 1968) is an American software engineer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems. He is the Secondary developer and maintainer of e2f ...
points out, while it is easy to add an extra creation-date field in the
inode The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribut ...
(thus technically enabling support for these timestamps in ext4), it is more difficult to modify or add the necessary
system calls In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, a ...
, like stat() (which would probably require a new version) and the various
libraries A library is a collection of Document, materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or electronic media, digital access (soft copies) materials, and may be a ...
that depend on them (like
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s ...
). These changes will require coordination of many projects. Therefore, the creation date stored by ext4 is currently only available to user programs on Linux via the statx() API. ; Project quotas : Support for project quotas was added in Linux kernel 4.4 on 8 Jan 2016. This feature allows assigning disk quota limits to a particular project ID. The project ID of a file is a 32-bit number stored on each file and is inherited by all files and subdirectories created beneath a parent directory with an assigned project ID. This allows assigning quota limits to a particular subdirectory tree independent of file access permissions on the file, such as user and project quotas that are dependent on the UID and GID. While this is similar to a directory quota, the main difference is that the same project ID can be assigned to multiple top-level directories and is not strictly hierarchical. ; Transparent encryption : Support for transparent encryption was added in Linux kernel 4.1 on June 2015. ; Lazy initialization : The lazyinit feature allows cleaning of inode tables in background, speeding initialization when creating a new ext4 file system. It is available since 2010 in Linux kernel version 2.6.37. ; Write barriers : ext4 enables write barriers by default. It ensures that file system metadata is correctly written and ordered on disk, even when write caches lose power. This goes with a performance cost especially for applications that use fsync heavily or create and delete many small files. For disks with a battery-backed write cache, disabling barriers (option 'barrier=0') may safely improve performance.


Limitations

In 2008, the principal developer of the ext3 and ext4 file systems,
Theodore Ts'o Theodore (Ted) Yue Tak Ts'o (曹子德) (born 1968) is an American software engineer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems. He is the Secondary developer and maintainer of e2f ...
, stated that although ext4 has improved features, it is not a major advance, it uses old technology, and is a stop-gap. Ts'o believes that
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
is the better direction because "it offers improvements in scalability, reliability, and ease of management". Btrfs also has "a number of the same design ideas that reiser3/ 4 had". However, ext4 has continued to gain new features such as file encryption and metadata checksums. The ext4 file system does not honor the "secure deletion" file attribute, which is supposed to cause overwriting of files upon deletion. A patch to implement secure deletion was proposed in 2011, but did not solve the problem of sensitive data ending up in the file-system journal.


Delayed allocation and potential data loss

Because delayed allocation changes the behavior that programmers have been relying on with ext3, the feature poses some additional risk of data loss in cases where the system crashes or loses power before all of the data has been written to disk. Due to this, ext4 in kernel versions 2.6.30 and later automatically handles these cases as ext3 does. The typical scenario in which this might occur is a program replacing the contents of a file without forcing a write to the disk with
fsync sync is a standard system call in the Unix operating system, which commits all data in the kernel filesystem to non-volatile storage buffers, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O lay ...
. There are two common ways of replacing the contents of a file on Unix systems: * fd=
open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * Open (Blues Image album), ''Open'' (Blues Image album), 1969 * Open (Gotthard album), ''Open'' (Gotthard album), 1999 * Open (C ...
("file", O_TRUNC); write(fd, data); close(fd);
: In this case, an existing file is truncated at the time of open (due to O_TRUNC flag), then new data is written out. Since the write can take some time, there is an opportunity of losing contents even with ext3, but usually very small. However, because ext4 can delay writing file data for a long time, this opportunity is much greater. : There are several problems that can arise: :# If the write does not succeed (which may be due to error conditions in the writing program, or due to external conditions such as a full disk), then both the original version ''and'' the new version of the file will be lost, and the file may be corrupted because only a part of it has been written. :# If other processes access the file while it is being written, they see a corrupted version. :# If other processes have the file open and do not expect its contents to change, those processes may crash. One notable example is a shared library file which is mapped into running programs. :Because of these issues, often the following idiom is preferred over the one above: * fd=open("file.new"); write(fd, data); close(fd);
rename Rename may refer to: * Rename (computing), rename of a file on a computer * RENAME (command), command to rename a file in various operating systems * Rename (relational algebra), unary operation in relational algebra * Company renaming, rename ...
("file.new", "file");
: A new temporary file ("file.new") is created, which initially contains the new contents. Then the new file is renamed over the old one. Replacing files by the rename() call is guaranteed to be atomic by
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
standards – i.e. either the old file remains, or it's overwritten with the new one. Because the ext3 default "ordered" journaling mode guarantees file data is written out on disk before metadata, this technique guarantees that either the old or the new file contents will persist on disk. ext4's delayed allocation breaks this expectation, because the file write can be delayed for a long time, and the rename is usually carried out before new file ''contents'' reach the disk. Using fsync() more often to reduce the risk for ext4 could lead to performance penalties on ext3 filesystems mounted with the data=ordered flag (the default on most Linux distributions). Given that both file systems will be in use for some time, this complicates matters for end-user application developers. In response, ext4 in Linux kernels 2.6.30 and newer detect the occurrence of these common cases and force the files to be allocated immediately. For a small cost in performance, this provides semantics similar to ext3 ordered mode and increases the chance that either version of the file will survive the crash. This new behavior is enabled by default, but can be disabled with the "noauto_da_alloc" mount option. The new patches have become part of the mainline kernel 2.6.30, but various distributions chose to backport them to 2.6.28 or 2.6.29. These patches don't completely prevent potential data loss or help at all with new files. The only way to be safe is to write and use software that does fsync() when it needs to. Performance problems can be minimized by limiting crucial disk writes that need fsync() to occur less frequently.


Implementation

Linux kernel Virtual File System is a subsystem or layer inside of the Linux kernel. It is the result of the very serious attempt to integrate multiple file systems into an orderly single structure. The key idea, which dates back to the pioneering work done by Sun Microsystems employees in 1986, is to abstract out that part of the file system that is common to all file systems and put that code in a separate layer that calls the underlying concrete file systems to actually manage the data. All system calls related to files (or pseudo files) are directed to the Linux kernel Virtual File System for initial processing. These calls, coming from user processes, are the standard POSIX calls, such as
open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * Open (Blues Image album), ''Open'' (Blues Image album), 1969 * Open (Gotthard album), ''Open'' (Gotthard album), 1999 * Open (C ...
, read, write, lseek, etc.


Compatibility with Windows and Macintosh

Currently, ext4 has full support on non-Linux operating systems. Windows can access ext4 since Windows 10 Insider Preview Build 20211. It is possible thanks to Windows Subsystem for Linux (WSL) which was introduced with Windows 10 Anniversary Update (version 1607) on 2 August 2016. WSL is available only in 64-bit versions of Windows 10 from version 1607. It is also available in Windows Server 2019. Big changes to the WSL architecture came with the release of WSL 2 on 12 June 2019. WSL 2 requires Windows 10 version 1903 or higher, with build 18362 or higher, for x64 systems, and version 2004 or higher, with build 19041 or higher, for ARM64 systems. Paragon offers its commercial product Linux File Systems for Windows which allows read/write capabilities for ext2/3/4 on Windows 7 SP1/8/8.1/10 and Windows Server 2008 R2 SP1/2012/2016. macOS has full ext2/3/4 read–write capability through the extFS for Mac by Paragon Software, which is a commercial product. Free software such as ext4fuse has read-only support with limited functionality.


See also

*
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
* Comparison of file systems * Extended file attributes *
e2fsprogs e2fsprogs (sometimes called the e2fs programs) is a set of utilities for maintaining the ext2, ext3 and ext4 file systems. Since those file systems are often the default for Linux distributions, it is commonly considered to be essential software. ...
* Ext2Fsd * JFS * List of file systems * Reiser4 *
XFS XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as ...
* ZFS


References


External links


ext4 documentation in Linux kernel source

Theodore Ts'o's discussion on ext4
29 June 2006
"ext4 online defragmentation"
(materials from Ottawa Linux Symposium 2007)
"The new ext4 filesystem: current status and future plans"
(materials from Ottawa Linux Symposium 2007)
Kernel Log: Ext4 completes development phase as interim step to btrfs
17 October 2008
"Ext4 block and inode allocator improvements"
(materials from Ottawa Linux Symposium 2008)
"Ext4: The Next Generation of Ext2/3 Filesystem"

Ext4 (and Ext2/Ext3) Wiki

Ext4
wiki at kernelnewbies.org * Native Windows port of Ext4 and other FS i
CROSSMETA

Ext2read
A windows application to read/copy ext2/ext3/ext4 files with extent and LVM2 support.
Ext2Fsd
Open source ext2/ext3/ext4 read/write file system driver for Windows. ext4 is supported from version 0.50 onwards
Ext4fuse
Open source read-only ext4 driver for
FUSE Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to protect ...
. (Supports
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lap ...
10.5 and later, usin
MacFuse
{{Filesystem 2008 software Disk file systems File systems supported by the Linux kernel