Large-file support (LFS) is the term frequently applied to the ability to create files larger than either 2 or 4
GiB on
32-bit
In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in a maximum of 32- bit units. Compared to smaller bit widths, 32-bit computers can perform la ...
filesystems.
Details
Traditionally, many operating systems and their underlying
file system implementations used
32-bit
In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in a maximum of 32- bit units. Compared to smaller bit widths, 32-bit computers can perform la ...
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s to represent
file sizes and positions. Consequently, no file could be larger than 2
32 − 1 bytes (4 GiB − 1). In many implementations, the problem was exacerbated by treating the sizes as
signed numbers, which further lowered the limit to 2
31 − 1 bytes (2 GiB − 1). Files that were too large for 32-bit operating systems to handle came to be known as ''large files''.
While the limit was quite acceptable at a time when
hard disk
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s were smaller, the general increase in storage capacity combined with increased server and desktop file usage, especially for
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
and
multimedia
Multimedia is a form of communication that uses a combination of different content forms, such as Text (literary theory), writing, Sound, audio, images, animations, or video, into a single presentation. T ...
files, led to intense pressure for OS vendors to overcome the limitation.
In 1996, multiple vendors responded by forming an industry initiative known as the Large File Summit to support large files on POSIX (at the time Windows NT already supported large files on NTFS), an obvious
backronym
A backronym is an acronym formed from an already existing word by expanding its letters into the words of a phrase. Backronyms may be invented with either serious or humorous intent, or they may be a type of false etymology or folk etymology. The ...
of "LFS". The summit was tasked to define a standardized way to switch to
64-bit
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, a ...
numbers to represent file sizes.
This switch caused deployment issues and required design modifications, the consequences of which can still be seen:
* The change to 64-bit file sizes frequently required incompatible changes to file system layout, which meant that large-file support sometimes necessitated a file system change. For example, the
FAT32
File Allocation Table (FAT) is a file system developed for personal computers and was the default file system for the MS-DOS and Windows 9x operating systems. Originally developed in 1977 for use on floppy disks, it was adapted for use on ...
file system does not support files larger than 4 GiB−1 (with older applications even only 2 GiB−1); the variant
FAT32+ does support larger files (up to 256 GiB−1), but (so far) is only supported in some versions of
DR-DOS
DR-DOS is a disk operating system for IBM PC compatibles, originally developed by Gary A. Kildall's Digital Research, Inc. and derived from Concurrent PC DOS 6.0, which was an advanced successor of CP/M-86. Upon its introduction in 198 ...
,
so users of
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
have to use
NTFS
NT File System (NTFS) (commonly called ''New Technology File System'') is a proprietary journaling file system developed by Microsoft in the 1990s.
It was developed to overcome scalability, security and other limitations with File Allocation Tabl ...
or
exFAT instead.
* To support binary compatibility with old
applications, operating system
interfaces had to retain their use of 32-bit file sizes and new interfaces had to be designed specifically for large-file support.
* To support writing
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide a ...
code that makes use of LFS where possible,
C standard library
The C standard library, sometimes referred to as libc, is the standard library for the C (programming language), C programming language, as specified in the ISO C standard.International Organization for Standardization, ISO/International Electrote ...
authors devised mechanisms that, depending on
preprocessor
In computer science, a preprocessor (or precompiler) is a Computer program, program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which i ...
constants, transparently redefined the functions to the 64-bit large-file aware ones.
* Many old interfaces, especially
C-based ones, explicitly specified argument types in a way that did not allow straightforward or transparent transition to 64-bit types. For example, the C functions
fseek
The C (programming language), C programming language provides many standard library subroutine, functions for computer file, file input/output, input and output. These functions make up the bulk of the C standard library header file, header . The ...
and
ftell
operate on file positions of type
long int
, which is typically 32 bits wide on 32-bit platforms, and cannot be made larger without sacrificing backward compatibility. (This was resolved by introducing new functions
fseeko
and
ftello
in
POSIX
The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
.
On Windows machines, under Visual C++, functions
_fseeki64
and
_ftelli64
are used.)
Adoption
The usage of the large-file API in 32-bit programs had been incomplete for a long time. An analysis did show in 2002 that many base libraries of operating systems were still shipped without large-file support thereby limiting applications using them.
The much-used
zlib
zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
library started to support 64-bit large-files on 32-bit platform not before 2006.
The problem disappeared slowly with PCs and workstations moving completely to
64-bit computing
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, ...
. Microsoft Windows Server 2008 has been the last server version to be shipped in 32-bit.
Redhat Enterprise Linux 7 was published in 2014 only as a 64-bit operating system.
Ubuntu Linux stopped delivering a 32-bit variant in 2019.
Nvidia stopped to develop 32-bit drivers in 2018 and deliver updates after January 2019.
Apple stopped developing 32-bit Mac OS versions in 2018 delivering
macOS Mojave
macOS Mojave ( ; version 10.14) is the fifteenth major release of macOS, Apple Inc.'s desktop operating system for Macintosh computers. Mojave was announced at Apple's Worldwide Developers Conference on June 4, 2018, and was released to the ...
only as a 64-bit operating system.
The
end-of-life for Windows 10 has been set to 2025 on the desktop which is related to the latest upgrades from old systems like Windows 7 & Windows 8 in January 2020 as some of those system ran on old computers built on the i386 architecture.
Windows 11
Windows 11 is a version of Microsoft's Windows NT operating system, released on October 5, 2021, as the successor to Windows 10 (2015). It is available as a free upgrade for devices running Windows 10 that meet the #System requirements, Windo ...
however will ship only as a 64-bit operating system since its first version in 2021.
A similar development can be seen in the mobile area. Google required to support 64-bit versions of applications in their app store by August 2019,
which allows to discontinue 32-bit support for
Android later.
The shift towards 64-bit started in 2014 when all new processors were designed to a 64-bit architecture and
Android 5 ("Lollipop") was published in that year providing a fitting 64-bit variant of the operating system.
Apple had made shift in the year before starting to produce the 64-Bit
Apple A7 by 2013. Google started to deliver the development environment for Linux only in 64-bit by 2015.
In May 2019 the share of Android versions below 5 had fallen to ten percent.
As
app developers concentrate on a single
compilation variant, many manufacturers started to require Android 5 as the minimum version by mid 2019, for example Niantic.
Subsequently the 32-bit versions were hard to get.
Except for
embedded systems
An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is em ...
with their special programs, the consideration of varying large-file support becomes obsolete in program code after 2020.
Related problems
The
year 2038 problem
The year 2038 problem (also known as Y2038, Y2K38, Y2K38 superbug or the Epochalypse) is a time computing problem that leaves some computer systems unable to represent times after 03:14:07 UTC on 19 January 2038.
The problem exists in ...
is well known for another case where a 32-bit "long" on 32-bit platforms will lead into problems. Just like the large-file limitation it will get obsolete when systems move to 64-bit only. In the meantime a 64-bit timestamp was introduced. In the Win32 API it is visible in functions having a "64" suffix along the earlier "32" suffix. When large-file support was added to the Win32 API it has led to functions having an additional "i64" suffix which sometimes makes for four combinations.(findfirst32, findfirst64, findfirst32i64, findfirst64i32).
By comparison the UNIX98 API introduces functions with a "64" suffix when "_LARGEFILE64_SOURCE" is used.
Related to the large-file API there is a limitation of block numbers for
mass storage
In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term ''mass'' in ''mass storage'' is used to mean ''large'' in relation to contemporaneous hard disk drive ...
media. With a common size of 512 bytes per
data block the barrier resulting from 32-bit numbers did occur later. When
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s reached a size of 2 terabyte (around 2010) the
master boot record
A master boot record (MBR) is a type of boot sector in the first block of disk partitioning, partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept ...
had to be replaced by the
GUID Partition Table
The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive. It is part of the Unified Extensible Firmware Interface (UEFI) standard.
It ha ...
which uses 64-bit for the LBA numbers (
logical block address
Logical block addressing (LBA) is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disk drives. LBA is a particularly simple linear addressin ...
). On
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
operating systems it did also require to enlarge the
inode
An inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribu ...
numbers which are used in some functions (stat64, setrlimit64). The
Linux kernel
The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
introduced that in 2001 leading to version 2.4 which was picked up by the glibc in that year.
As the large-file support and large-disk support was introduced at the same time the
GNU C Library
The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also dir ...
exports 64-bit inode structures on 32-bit architectures at the same time when the Unix LFS API is activated in program code.
When the kernel moved to 64-bit inodes the file system
ext3
ext3, or third extended filesystem, is a journaling file system, journaled file system that is commonly used with the Linux kernel. It used to be the default file system for many popular Linux distributions but generally has been supplanted by ...
used them internally in the driver by 2001. However the inode format on the storage media itself was stuck at 32-bit numbers.
As mass storage devices moved to the
Advanced Format
Advanced Format (AF) is any disk sector format used to store data in HDDs, SSDs and SSHDs that exceeds 528 bytes per sector, frequently 4096, 4112, 4160, or 4224-byte sectors. Larger sectors of an Advanced Format Drive (AFD) enable the integratio ...
of 4 kilobyte per block the actual limit of that file system format is at 8 or 16 terabyte.
Handling larger disk partitions requires the usage of a different file system like
XFS
XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; a ...
which was designed with 64-bit inodes from the start allowing for exabyte files and partitions.
The first 16 terabyte magnetic disk drives were delivered by mid 2019.
Solid-state drive
A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk.
SSDs rely on non- ...
with 32 TiB for data centers were available as early as 2016 with some manufacturers forecasting 100 TiB SSD by 2020.
See also
*
2 GB limit
*
RF64 – 64-bit support for
BWF WAV audio files
*
Comparison of large-file support in text editors
*
FAT32+
*
File size
File size is a measure of how much data a computer file contains or how much storage space it is allocated. Typically, file size is expressed in units based on byte. A large value is often expressed with a metric prefix (as in megabyte and giga ...
*
File spanning
*
Long filename support (LFN)
*
Year 2038 problem
The year 2038 problem (also known as Y2038, Y2K38, Y2K38 superbug or the Epochalypse) is a time computing problem that leaves some computer systems unable to represent times after 03:14:07 UTC on 19 January 2038.
The problem exists in ...
References
External links
* {{cite web , author-first=Andreas , author-last=Jaeger , date=2005-02-15 , title=Large File Support in Linux , publisher=
SuSE GmbH , url=http://www.suse.de/~aj/linux_lfs.html , access-date=2006-09-10
Computer file systems