HOME

TheInfoList



OR:

File carving is the process of reassembling computer files from fragments in the absence of filesystem metadata.


Introduction and basic principles

All filesystems contain some metadata that describes the actual file system. At a minimum, this includes the hierarchy of folders and files, with names for each. The filesystem will also record the physical locations on the storage device where each file is stored. As explained below, a file might be scattered in fragments at different physical addresses. File carving is the process of trying to recover files without this metadata. This is done by analyzing the raw data and identifying what it is (text, executable, png, mp3, etc.). This can be done in different ways, but the simplest is to look for the
file signature {{short description, Data used to identify or verify the content of a file In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to: * File magic number: bytes within a file used to ...
or "magic numbers" that mark the beginning and/or end of a particular file type. For instance, every Java class file has as its first four bytes the hexadecimal value CA FE BA BE. Some files contain footers as well, making it just as simple to identify the ending of the file. Most file systems, such as the
FAT In nutrition, biology, and chemistry, fat usually means any ester of fatty acids, or a mixture of such compounds, most commonly those that occur in living beings or in food. The term often refers specifically to triglycerides (triple est ...
family and UNIX's Fast File System, work with the concept of clusters of an equal and fixed size. For example, a
FAT32 File Allocation Table (FAT) is a file system developed for personal computers. Originally developed in 1977 for use on floppy disks, it was adapted for use on hard disks and other devices. It is often supported for compatibility reasons by ...
file system might be broken into clusters of 4 KiB each. Any file smaller than 4 KiB fits into a single cluster, and there is never more than one file in each cluster. Files that take up more than 4 KiB are allocated across many clusters. Sometimes these clusters are all contiguous, while other times they are scattered across two or potentially many more so called fragments, with each fragment containing a number of contiguous clusters storing one part of the file's data. Obviously large files are more likely to be fragmented.
Simson Garfinkel Simson L. Garfinkel (born 1965) is Senior Data Scientist at the Department of Homeland Security (DHS). He was formerly the US Census Bureau's Senior Computer Scientist for Confidentiality and Data Access. Previously, he was a computer scientist at ...
Simson Garfinkel Simson L. Garfinkel (born 1965) is Senior Data Scientist at the Department of Homeland Security (DHS). He was formerly the US Census Bureau's Senior Computer Scientist for Confidentiality and Data Access. Previously, he was a computer scientist at ...

"Carving Contiguous and Fragmented Files with Fast Object Validation"
, in Proceedings of the 2007
digital forensics Digital forensics (sometimes known as digital forensic science) is a branch of forensic science encompassing the recovery, investigation, examination and analysis of material found in digital devices, often in relation to mobile devices and comp ...
research workshop, DFRWS, Pittsburgh, PA, August 2007
reported fragmentation statistics collected from over 350 disks containing
FAT In nutrition, biology, and chemistry, fat usually means any ester of fatty acids, or a mixture of such compounds, most commonly those that occur in living beings or in food. The term often refers specifically to triglycerides (triple est ...
,
NTFS New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred fil ...
and UFS file systems. He showed that while fragmentation in a typical disk is low, the fragmentation rate of forensically important files such as email, JPEG and
Word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
documents is relatively high. The fragmentation rate of JPEG files was found to be 16%, Word documents had 17% fragmentation, AVI had a 22% fragmentation rate and PST files ( Microsoft Outlook) had a 58% fragmentation rate (the fraction of files being fragmented into two or more fragments). Pal, Shanmugasundaram, and MemonA. Pal and N. Memon
"Automated reassembly of file fragmented images using greedy algorithms - URL now invalid"
in IEEE Transactions on Image processing, February 2006, pp. 385–393
presented an efficient algorithm based on a greedy heuristic and
alpha-beta pruning Alphabeta is an Israeli musical group. Alphabeta or Alpha Beta may also refer to: *The Greek alphabet, from ''Alpha'' (Αα) and ''Beta'' (Ββ), the first two letters *Alpha Beta, a former chain of Californian supermarkets *Alpha and beta anomers ...
for reassembling fragmented images. Pal, Sencar, and MemonA. Thus, finding the header of a file means that the first fragment of the file is found, but the other fragments might be scattered anywhere else on the partition, making file carving much more challenging. By studying how file systems actually do fragmentation and applying statistics, it is possible to make qualified guesses as to which fragments might fit together. These fragments are then put together in various possible permutations and it is tested if the fragments fit together. For some files it is easy for the software to test if they fit, while for others, the software might accidentally fit the pieces together incorrectly. Pal, T. Sencar and N. Memon
"Detecting File Fragmentation Point Using Sequential Hypothesis Testing - URL now invalid"
Digital Investigations, Fall 2008
introduced sequential hypothesis testing as an effective mechanism for detecting fragmentation points. Richard and RoussevRichard, Golden, Roussev, V.
"Scalpel: a frugal, high performance file carver"
, in Proceedings of the 2005 Digital Forensics Research Workshop, DFRWS, August 2005
presented Scalpel, an open-source file-carving tool. File carving is a highly complex task, with a potentially huge number of permutations to try. To make this task tractable, carving software typically makes extensive use of models and heuristics. This is necessary not only from a standpoint of execution time, but also for the accuracy of the results. State-of-the-art file carving algorithms use statistical techniques like sequential hypothesis testing for determining fragmentation points.


Motivation

In most cases, when a file is deleted, the entry in the file system metadata is removed but the actual data is still on the disk. File carving can be used to recover data from a hard disk where the metadata was removed or otherwise damaged. This process may be successful even after a drive is formatted or repartitioned. File carving can be performed using free or commercial software and is often performed in conjunction with computer forensics examinations or alongside other recovery efforts (e.g. hardware repair) by
data recovery In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The dat ...
companies. Whereas the primary goal of data recovery is to recover the file content, computer forensics examiners are often just as interested in the metadata such as who owned a file, where it was stored, and when it was last modified."Understanding Deleted Files"
/ref> Thus, while a forensic examiner could use file carving to prove that a file was once stored on a hard drive, he or she might need to seek out other evidence to prove who put it there.


Carving schemes


Bifragment gap carving

Garfinkel introduced the use of fast object validation for reassembling files that have been split into two pieces. This technique is referred to as Bifragment Gap Carving (BGC). A set of starting fragments and a set of finishing fragments are identified. The fragments are reassembled if together they form a valid object.


SmartCarving

Pal developed a carving scheme that is not limited to bifragmented files. The technique, known as SmartCarving, makes use of heuristics regarding the fragmentation behavior of known filesystems. The algorithm has three phases: preprocessing, collation, and reassembly. In the preprocessing phase, blocks are decompressed and/or decrypted if necessary. In the collation phase, blocks are sorted according to their file type. In the reassembly phase, the blocks are placed in sequence to reproduce the deleted files. The SmartCarving algorithm is the basis for the Adroit Photo Forensics and Adroit Photo Recovery applications from Digital Assembly.


Carving memory dumps

Snapshots of computers' volatile memory (i.e. RAM) can be carved. Memory-dump carving is routinely used in digital forensics, allowing investigators to access ephemeral evidence. Ephemeral evidence includes recently accessed images and Web pages, documents, chats and communications committed via social networks. If an encrypted volume (
TrueCrypt TrueCrypt is a discontinued source-available freeware utility used for on-the-fly encryption (OTFE). It can create a virtual encrypted disk within a file, or encrypt a partition or the whole storage device ( pre-boot authentication). On 28 M ...
,
BitLocker BitLocker is a full volume encryption feature included with Microsoft Windows versions starting with Windows Vista. It is designed to protect data by providing encryption for entire volumes. By default, it uses the AES encryption algorithm i ...
,
PGP Disk PGP Virtual Disk is a disk encryption system that allows one to create a virtual encrypted disk within a file. Older versions for Windows NT were freeware (for example, bundled with PGP v6.0.2i; and with some of the CKT builds of PGP). These are st ...
) was used, binary keys to encrypted containers can be extracted and used to instantly mount such volumes. The content of volatile memory gets fragmented. A proprietary carving algorithm was developed by Belkasoft to enable carving fragmented memory sets (BelkaCarving).


See also

*
Data recovery In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The dat ...
* Error detection and correction *
Data archaeology There are two conceptualisations of data archaeology, the technical definition and the social science definition. Data archaeology (also data archeology) in the technical sense refers to the art and science of recovering computer data Code, enc ...
*
Foremost (software) Foremost is a forensic data recovery program for Linux. Foremost is used to recover files using their headers, footers, and data structures through a process known as file carving. Although written for law enforcement use, the program and its ...
*
PhotoRec PhotoRec is a free and open-source utility software for data recovery with text-based user interface using data carving techniques, designed to recover lost files from various digital camera memory, hard disk and CD-ROM. It can recover the files ...
*
Recover My Files Recover My Files is a data recovery program that uses file carving to extract lost files from unallocated clusters. Recovery is based on the interpretation of file content, usually through the process of reverse engineering a file type. It can b ...
* IsoBuster


References

{{reflist Data recovery