HOME

TheInfoList



OR:

In computing, as well as in non-computing contexts, a file sequence is a
well-order In mathematics, a well-order (or well-ordering or well-order relation) on a set ''S'' is a total order on ''S'' with the property that every non-empty subset of ''S'' has a least element in this ordering. The set ''S'' together with the well-or ...
ed, (finite) collection of
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
s, usually related to each other in some way. In computing, file sequences should ideally obey some kind of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locali ...
principle, so that not only all the files belonging to the same sequence ought to be locally referenced to each other, but they also obey that as much as is their proximity with respect to the ordering relation. ''Explicit'' file sequences are, in fact, sequences whose filenames all end with a numeric or alphanumeric tag in the end (excluding
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
). The aforementioned locality of reference usually pertains either to the data, the metadata (e.g. their filenames or last-access dates), or the physical proximity within the storage media they reside in. In the latter acception it is better to speak about file contiguity (see below).


Identification

Every
GUI The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inst ...
program shows contents of folders by usually ordering its files according to some criteria, mostly related to the files'
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, like the filename. The criterion is, by default, the alphanumeric ordering of filenames, although some operating systems do that in "smarter" ways than others: for example file1.ext should ideally be placed before file10.ext, like
GNOME Files GNOME Files, formerly and internally known as Nautilus, is the official file manager for the GNOME desktop. Nautilus was originally developed by Eazel with many luminaries from the tech world including Andy Hertzfeld (Apple), chief architect for ...
and
Thunar Thunar is a file manager for Linux and other Unix-like systems, initially written using the GTK+ 2 toolkit and later ported to the GTK+ 3 toolkit. It started to ship with Xfce in version 4.4 RC1 and later. Thunar is developed by Benedikt Meu ...
do, whereas, alphanumerically, it comes after (more on that later). Other criteria exist, like ordering files by their
file type A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
(or by their extension) and, if the same type, by either filename or last-access date, and so on. For this reason, when a file sequence has a more strong locality of reference, particularly when it is related to their actual contents, it is better to highlight this fact by letting their
well-ordering In mathematics, a well-order (or well-ordering or well-order relation) on a set ''S'' is a total order on ''S'' with the property that every non-empty subset of ''S'' has a least element in this ordering. The set ''S'' together with the well-ord ...
induce an alphanumeric ordering of the filenames too. That is the case of ''explicit'' file sequences.


Explicit file sequences

Explicit file sequences have the same filename (including
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
s in order to confirm their contents' locality of reference) except for the final part (excluding the extension), which is a sequence of either numeric, alphanumeric or purely alphabetical characters to force a specific ordering; such sequences should also be ideally located all within the same directory. In this sense any files sharing the same filename (and possibly extension), only differing by the sequence number at the end of the filename, automatically belong to the same file sequence, at least when they are located in the same folder. It is also part of many naming conventions that number-indexed file sequences (in any
number base In a positional numeral system, the radix or base is the number of unique digits, including the digit zero, used to represent numbers. For example, for the decimal/denary system (the most common system in use today) the radix (base number) is t ...
) containing as many files as to span at most a fixed number of digits, make use of "
trailing zero In mathematics, trailing zeros are a sequence of 0 in the decimal representation (or more generally, in any positional representation) of a number, after which no other digits follow. Trailing zeros to the right of a decimal point, as in 12.340 ...
es" in their filenames so that: * all the files in the sequence share exactly the same number of characters in their complete filenames; * non-smart alphanumeric orderings, like those of operating systems'
GUI The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inst ...
s, do not erroneously permute them within the sequence. To better explain the latter point, consider that, strictly speaking, file1.ext (1st file in the sequence) comes alphanumerically ''after'' file100.ext, which is actually the hundredth. By renaming the first file to file001.ext with two trailing zeroes, the problem is universally solved. Examples of explicit file sequences include: file00000.ext, file00001.ext, file00002.ext, ..., file02979.ext (five trailing zeroes), and another with a
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
ordering of 256 files tag_00.ext, tag_01.ext, ..., tag_09.ext, tag_0A.ext, ..., tag_0F.ext, tag_10.ext, ..., tag_0F.ext, ..., tag_FF.ext (with just one trailing zero). Software and programming conventions usually represent a file sequence as a single virtual file object, whose name is comprehensively written in C-like formatted-string notation to represent where the sequence number is located in the filename and what is its formatting. For the two examples above, that would be filename%05d.ext and tag_%02H.ext, respectively, whereas for the former one, the same convention without
trailing zero In mathematics, trailing zeros are a sequence of 0 in the decimal representation (or more generally, in any positional representation) of a number, after which no other digits follow. Trailing zeros to the right of a decimal point, as in 12.340 ...
es would be filename%5d.ext. Note, however, that such notation is usually not valid at operating system and command-line interface levels, because the '%' character is neither a valid
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
nor a universally ''legal'' filename character: that notation just stands as a
placeholder Placeholder may refer to: Language * Placeholder name, a term or terms referring to something or somebody whose name is not known or, in that particular context, is not significant or relevant. * Filler text, text generated to fill space or provi ...
for the virtual file-like representing the whole explicit file sequence. Notable software packages acknowledging explicit file sequences as single filesystem objects, rather typical in the Audio/Video post-production industry (see below), are found among products by
Autodesk Autodesk, Inc. is an American multinational software corporation that makes software products and services for the architecture, engineering, construction, manufacturing, media, education, and entertainment industries. Autodesk is headquartered ...
,
Quantel Quantel was a company based in the United Kingdom and founded in 1973 that designed and manufactured digital production equipment for the broadcast television, video production and motion picture industries. They were headquartered in Newbury, ...
,
daVinci Leonardo di ser Piero da Vinci (15 April 14522 May 1519) was an Italian polymath of the High Renaissance who was active as a painter, draughtsman, engineer, scientist, theorist, sculptor, and architect. While his fame initially rested on h ...
, DVS, as well as
Adobe After Effects Adobe After Effects is a digital visual effects, motion graphics, and compositing application developed by Adobe Inc., and used in the post-production process of film making, video games and television production. Among other things, After Effe ...
.


File scattering

A file sequence located within a
mass storage In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term is used as large in relation to contemporaneous hard disk drives, but it has been used large in relati ...
device is said to be ''contiguous'' if: * every file in the sequence is unfragmented, i.e. each file is stored in one contiguous and ordered piece of storage space (ideally in one or multiple, but contiguous, extents); * consecutive files in the sequence occupy contiguous portions of storage space ( extents, yet consistently with their file ordering). File contiguity is a more practical requirement for file sequences than just their
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locali ...
, because it is related to the storage medium hosting the whole sequence than to the sequence itself (or its
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
). At the same time, it is a "high-level" feature, because it is not related to the physical and technical details of mass storage itself: particularly, file contiguity is realized in different ways according to the storage device's architecture and actual
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
structure. At "low level", each file in a contiguous sequence must be placed in contiguous blocks, in spite of reserved areas or special metadata required by the filesystem (like
inodes The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribut ...
or inter-sector headers) actually interleaving them. File contiguity is, in most practical applications, "invisible" at operating-system or user levels, since all the files in a sequence are always available to applications in the same way, regardless of their physical location on the storage device (due to operating systems hiding the filesystem internals to higher-level services). Indeed, file contiguity may be related to I/O performance when the sequence is to be read or written in the shortest time possible. In some contexts (like optical disk burning - also cfr. below), data in a file sequence must be accessed in the same order as the file sequence itself; in other contexts, a "random" access to the sequence may be required. In both cases, most professional filesystems provide faster access strategies to contiguous files than non-contiguous ones. Data pre-allocation is crucial for write access, whereas burst read speeds are achievable only for contiguous data. When a file sequence is not contiguous, it is said to be ''scattered'', since its files are stored in sparse locations on the storage device. File scattering is the process of allocating (or re-allocating) a file sequence as being (or becoming) uncontiguous. That is often associated with file fragmentation too, where each file is also stored in several, non-contiguous blocks; mechanisms contributing to the former are usually a common cause to the latter too. The act of reducing file scattering by means of allocating (in the first place) or moving (for already-stored data) files in the same sequence near together on the storage medium is called (file) file ''de''scattering. A few
defragmentation In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguo ...
strategies and dedicated software are able to both defragment single files and descatter file sequences.


Multimedia file sequences

There are many contexts which explicit file sequences are particularly important in: incremental backups, periodic logs and multimedia files captured or created with a chronological
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locali ...
. In the latter case, explicit file numbering is extremely important in order to provide both software and end users a way to discern the consequentiality of the contents stored therein. For example, digital cameras and similar devices save all the picture files in the same folder (until it either reaches its maximum file-number capacity, or a new event like midnight-coming or device-switching takes place) with a final number sequence: it would be very unpractical to choose a filename for each taken shot on the very shooting time, so the camera firmware/software picks one which is perfectly identifiable by its sequence number. With the aid of other
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
(and usually of specialized PC software), users can later on discern the multimedia contents and re-organize them, if needed.


The Digital Intermediate example

A typical example where explicit file sequences, as well as their contiguity, becomes crucial is in the
digital intermediate Digital intermediate (typically abbreviated DI) is a motion picture finishing process which classically involves digitizing a motion picture and manipulating the color and other image characteristics. Definition and overview A digital intermediate ...
(DI) workflow for motion picture and video industries. In such contexts, video data need to maintain the highest quality and be ready for visualization (usually real-time if not even better). Usually video data are acquired from either a digital video camera or a
motion picture film scanner A motion picture film scanner is a device used in digital filmmaking to scan original film for storage as high-resolution digital intermediate files. A film scanner scans original film stock: negative or positive print or reversal/IP. Units may s ...
and stored into file sequences (as much as a common photographic camera does) and need to be post-produced in several steps, including at least editing, conforming and colour-correction. That requires: * Uncompressed data, because any
lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
, which is common in most finalized products, introduces unacceptable quality losses. * Uncompressed data (again), because decompression times may degrade playing/visualization performance by hardware and software. * Frame-per-file data management, because common post-production operations imply the shortest seek-times ever; "fast-forwarding" or "rewinding" to a specific (key) frame is much faster if done at
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
level rather than within a huge, possibly fragmented video file; every frame is then stored in a single file as a still digital picture. * Unambiguous frames' ordering, for obvious reasons, which is best accomplished grouping all the files together with explicit file numbering. * File contiguity, because many filesystem architectures employ higher I/O speeds if transferring data on contiguous areas of the storage, whereas random allocation might prevent real-time or better loading performances. Consider that a single frame in a DI project is currently from 9MB to 48MB large (depending upon
resolution Resolution(s) may refer to: Common meanings * Resolution (debate), the statement which is debated in policy debate * Resolution (law), a written motion adopted by a deliberative body * New Year's resolution, a commitment that an individual m ...
and colour-depth), whereas video
refresh rate The refresh rate (or "vertical refresh rate", "vertical scan rate", terminology originating with the cathode ray tubes) is the number of times per second that a raster-based display device displays a new image. This is independent from frame rate ...
is commonly 24 or 25 frames per second (if not faster); any storage required for real-time playing such contents thus needs a minimum overall
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
of 220MB/s to 1.2GB/s, respectively. With those numbers, all the above requirements (particularly file contiguity, given nowadays storage performances) become strictly mandatory.


External links


PySeq
PySeq is an open source python module that finds groups of items that follow a naming convention containing a numerical sequence index (e.g. fileA.001.png, fileA.002.png, fileA.003.png...) and serializes them into a compressed sequence string representing the entire sequence (e.g. fileA.1-3.png).
checkfileseq
checkfileseq is an open source python script (usable via
CLI CLI may refer to: Computing * Call Level Interface, an SQL database management API * Command-line interface, of a computer program * Command-line interpreter or command language interpreter; see List of command-line interpreters * CLI (x86 instr ...
) that scans a directory structure recursively for files missing in a file sequence and prints a report upon completion. It supports a wide array of filename patterns and can be customized to gain additional pattern logic. {{DEFAULTSORT:File sequence Computer files