Redaction or sanitization is the process of removing
sensitive information from a document so that it may be distributed to a broader audience. It is intended to allow the selective disclosure of information. Typically, the result is a document that is suitable for
publication
To publish is to make content available to the general public.[Berne Convention, articl ...](_blank)
or for dissemination to others rather than the intended audience of the original document.
When the intent is
secrecy protection, such as in dealing with
classified information
Classified information is confidential material that a government deems to be sensitive information which must be protected from unauthorized disclosure that requires special handling and dissemination controls. Access is restricted by law or ...
, redaction attempts to reduce the document's classification level, possibly yielding an unclassified document. When the intent is
privacy protection Privacy engineering is an emerging field of engineering which aims to provide methodologies, tools, and techniques to ensure systems provide acceptable levels of privacy. Its focus lies in organizing and assessing methods to identify and tackle priv ...
, it is often called
data anonymization
Data anonymization is a type of Sanitization (classified information), information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the dat ...
. Originally, the term ''sanitization'' was applied to printed documents; it has since been extended to apply to
computer file
A computer file is a System resource, resource for recording Data (computing), data on a Computer data storage, computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a ...
s and the problem of
data remanence
Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of ...
.
Government secrecy
In the context of government documents, redaction (also called sanitization) generally refers more specifically to the process of removing sensitive or classified information from a document prior to its publication, during
declassification.
Secure document redaction techniques
Redacting confidential material from a paper document before its public release involves overwriting portions of text with a wide black pen, followed by
photocopying
A photocopier (also called copier or copy machine, and formerly Xerox machine, the generic trademark) is a machine that makes copies of documents and other visual images onto paper or plastic film quickly and cheaply. Most modern photocopiers ...
the result—the obscured text may be recoverable from the original. Alternatively opaque "cover up tape" or "redaction tape", opaque, removable
adhesive tape
Adhesive tape is one of many varieties of backing materials coated with an adhesive. Several types of adhesives can be used.
Types
Pressure-sensitive tape
Pressure-sensitive tape, PSA tape, self-stick tape or sticky tape consists of a pre ...
in various widths, may be applied before photocopying.
This is a simple process with only minor security risks. For example, if the black pen or tape is not wide enough, careful examination of the resulting photocopy may still reveal partial information about the text, such as the difference between short and tall letters. The exact length of the removed text also remains recognizable, which may help in guessing plausible wordings for shorter redacted sections. Where computer-generated proportional fonts were used, even more information can leak out of the redacted section in the form of the exact position of nearby visible characters.
The
UK National Archives
The National Archives (TNA; ) is a non-ministerial department of the Government of the United Kingdom. Its parent department is the Department for Culture, Media and Sport of the United Kingdom of Great Britain and Northern Ireland. It is the ...
published a document, ''Redaction Toolkit, Guidelines for the Editing of Exempt Information from Documents Prior to Release'', "to provide guidance on the editing of exempt material from information held by public bodies."
Secure redacting is more complicated with
computer file
A computer file is a System resource, resource for recording Data (computing), data on a Computer data storage, computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a ...
s. Word processing formats may save a revision history of the edited text that still contains the redacted text. In some file formats, unused portions of memory are saved that may still contain fragments of previous versions of the text. Where text is redacted, in Portable Document (PDF) or word processor formats, by overlaying graphical elements (usually black rectangles) over text, the original text remains in the file and can be uncovered by simply deleting the overlaying graphics. Effective redaction of electronic documents requires the removal of all relevant text and image data from the document file. This process, internally complex, can be carried out very easily by a user with the aid of "redaction" functions in software for editing PDF or other files.
Redaction may administratively require marking of the redacted area with the reason that the content is being restricted. US government documents released under the Freedom of Information Act are marked with exemption codes that denote the reason why the content has been withheld.
The US
National Security Agency
The National Security Agency (NSA) is an intelligence agency of the United States Department of Defense, under the authority of the director of national intelligence (DNI). The NSA is responsible for global monitoring, collection, and proces ...
(NSA) published a guidance document which provides instructions for redacting PDF files.
Printed matter

Printed documents which contain classified or sensitive information frequently contain a great deal of information which is less sensitive. There may be a need to release the less sensitive portions to
uncleared personnel. The printed document will consequently be sanitized to obscure or remove the sensitive information. Maps have also been redacted for the same reason, with highly sensitive areas covered with a slip of white paper.
In some cases, sanitizing a classified document removes enough information to reduce the classification from a higher level to a lower one. For example, raw
intelligence reports may contain highly classified information such as the identities of
spies, that is removed before the reports are distributed outside the intelligence agency: the initial report may be classified as Top Secret while the sanitized report may be classified as Secret.
In other cases, such as the NSA report on the
USS ''Liberty'' incident (right), the report may be sanitized to remove all sensitive data, so that the report may be released to the general public.
As is seen in the USS ''Liberty'' report, paper documents are usually sanitized by covering the classified and sensitive portions before photocopying the document.
Computer media and files
Computer (electronic or digital) documents are more difficult to sanitize. In many cases, when information in an information system is modified or erased, some or all of the data remains in
storage. This may be an accident of design, where the underlying storage mechanism (
disk,
RAM
Ram, ram, or RAM most commonly refers to:
* A male sheep
* Random-access memory, computer memory
* Ram Trucks, US, since 2009
** List of vehicles named Dodge Ram, trucks and vans
** Ram Pickup, produced by Ram Trucks
Ram, ram, or RAM may also ref ...
, etc.) still allows information to be read, despite its nominal erasure. The general term for this problem is
data remanence
Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of ...
. In some contexts (notably the US NSA,
DoD, and related organizations), "sanitization" typically refers to countering the data remanence problem.
However, the retention may be a deliberate
feature
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenome ...
, in the form of an
undo
Undo is an interaction technique which is implemented in many computer programs. It erases the last change done to the document, reverting it to an older state. In some more advanced programs, such as graphic processing, undo will negate the las ...
buffer, revision history, "trash can",
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
s, or the like. For example, word processing programs like
Microsoft Word
Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
will sometimes be used to edit out the sensitive information. These products do not always show the user all of the information stored in a file, so it is possible that a file may still contain sensitive information. In other cases, inexperienced users use ineffective methods which fail to sanitize the document.
Metadata removal tool
Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mai ...
s are designed to effectively sanitize documents by removing potentially sensitive information.
In May 2005 the US military published a report on the death of
Nicola Calipari
Nicola Calipari (June 23, 1953March 4, 2005) was an Italian major general and SISMI military intelligence officer. Calipari was accidentally killed in Iraq by American soldiers while escorting a recently released Italian hostage, journalist G ...
, an Italian secret agent, at a US military checkpoint in Iraq. The published version of the report was in PDF format, and had been incorrectly redacted by covering sensitive parts with opaque blocks in software. Shortly thereafter, readers discovered that the blocked-out portions could be retrieved by
copying and pasting them into a word processor.
On May 24, 2006, lawyers for the communications service provider
AT&T
AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
filed a
legal brief
A brief (Old French from Latin ''brevis'', "short") is a written legal document used in various legal adversarial systems that is presented to a court arguing why one party to a particular case should prevail.
In England and Wales (and other Co ...
regarding their cooperation with domestic wiretapping by the NSA. Text on pages 12 to 14 of the PDF document were incorrectly redacted, and the covered text could be retrieved.
At the end of 2005, the NSA released a report giving recommendations on how to safely sanitize a Microsoft Word document.
Issues such as these make it difficult to reliably implement
multilevel security
Multilevel security or multiple levels of security (MLS) is the application of a computer system to process information with incompatible classifications (i.e., at different security levels), permit access by users with different security clearan ...
systems, in which computer users of differing security clearances may share documents. ''The Challenge of Multilevel Security'' gives an example of a sanitization failure caused by unexpected behavior in Microsoft Word's change tracking feature.
The two most common mistakes for incorrectly redacting a document are adding an image layer over the sensitive text to obscure it, without removing the underlying text, and setting the background color to match the text color. In both of these cases, the redacted material still exists in the document underneath the visible appearance and is subject to searching and even simple copy and paste extraction. Proper redaction tools and procedures must be used to permanently remove the sensitive information. This is often accomplished in a multi-user workflow where one group of people mark sections of the document as proposals to be redacted, another group verifies the redaction proposals are correct, and a final group operates the redaction tool to permanently remove the proposed items.
See also
*
Censorship
Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governmen ...
*
Data erasure
Data erasure (sometimes referred to as data clearing, data wiping, or data destruction) is a software-based method of data sanitization that aims to completely destroy all electronic data residing on a hard disk drive or other digital media by ...
*
Data remanence
Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of ...
*
Freedom of information laws by country
Freedom of information laws allow access by the general public to data held by national governments and, where applicable, by state and local governments. The emergence of freedom of information legislation was a response to increasing dissatis ...
*
Lacuna
References
External links
''Embarrassing Redaction Failures''from Vol. 58 No. 2, May 2019, Technology column of ''The Judges' Journal'' published by the
American Bar Association
The American Bar Association (ABA) is a voluntary association, voluntary bar association of lawyers and law students in the United States; national in scope, it is not specific to any single jurisdiction. Founded in 1878, the ABA's stated acti ...
{{DEFAULTSORT:Sanitization (Classified Information)
Classified information
Data security
Classified documents