Bush hid the facts
   HOME

TheInfoList



OR:

Bush hid the facts is a common name for a bug present in some versions of
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
, which causes text encoded in
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
to be interpreted as if it were
UTF-16LE UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as co ...
, resulting in garbled text. When the string "Bush hid the facts", without quotes, was put in a new
Notepad A notebook (also known as a notepad, writing pad, drawing pad, or legal pad) is a book or stack of paper pages that are often ruled and used for purposes such as note-taking, journaling or other writing, drawing, or scrapbooking. History ...
document and saved, closed, and reopened, the nonsensical sequence of
Chinese characters Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji ...
"" would appear instead. While " Bush hid the facts" is the sentence most commonly presented on the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, p ...
to induce the error, the bug can be triggered by other strings with letters and spaces in the same positions, for example or . Other sequences trigger the bug as well, including simply the text . (This most commonly used sentence is a reference to U.S. President George W. Bush's statements about nuclear weapons in Iraq.) The bug occurs when the string is passed to the Win32
charset detection Character encoding detection, charset detection, or code page detection is the process of heuristically guessing the character encoding of a series of bytes that represent text. The technique is recognised to be unreliable and is only used when sp ...
function IsTextUnicode. IsTextUnicode sees that the bytes match the UTF-16LE encoding of assigned Unicode code points, concludes that the text is valid UTF-16LE, and returns true, and the application then incorrectly interprets the text as UTF-16LE. The bug had existed since IsTextUnicode was introduced with in 1994, but was not discovered until early 2004. Many text editors and tools exhibit this behavior on Windows because they use IsTextUnicode to determine the encoding of text files. As of
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
, Notepad has been modified to use a different detection algorithm that does not exhibit the bug, but IsTextUnicode remains unchanged in the operating system, so any other tools that use the function are still affected.


Workarounds

Several workarounds exist for this bug: *Editing the text to not be a pattern that triggers this bug will avoid it. For instance, adding a
new line New is an adjective referring to something recently made, discovered, or created. New or NEW may refer to: Music * New, singer of K-pop group The Boyz Albums and EPs * ''New'' (album), by Paul McCartney, 2013 * ''New'' (EP), by Regurgitator, ...
in the first 20 characters will work. *If the file is saved as "
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
" (before 2018) or "UTF-8 with BOM" (after 2018) rather than "ANSI" the text loads correctly, because Notepad prepends a UTF-8 byte order mark, which is a pattern that does not trigger the bug. Opening a file that is valid UTF-8 ''without'' the byte order mark would still trigger the bug, as this sequence is represented identically in UTF-8 as in ASCII. *The bug is also avoided by saving as "Unicode", which in Microsoft Windows means UTF-16LE. When loading this text IsTextUnicode should (and does) return true and the text is correct. *To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. Under Windows 2000, Notepad lacks the "Encoding" list box.
Notepad2 A notebook (also known as a notepad, writing pad, drawing pad, or legal pad) is a book or stack of paper pages that are often ruled and used for purposes such as note-taking, journaling or other writing, drawing, or scrapbooking. History ...
also lacks this.
WordPad WordPad is the basic word processor that has been included with almost all versions of Microsoft Windows from Windows 95 onwards. It is more advanced than Windows Notepad, and simpler than Microsoft Word and Microsoft Works (last updated in 2007 ...
appears to load the text correctly without choosing the encoding, since it uses its own encoding detection.


References

{{Reflist


External links


The Notepad file encoding problem, redux
Raymond Chen Raymond T. Chen (born 1968) is a United States circuit judge of the United States Court of Appeals for the Federal Circuit. Biography He joined the intellectual property law firm of Knobbe, Martens, Olson & Bear in Irvine, California. He pros ...

IsTextUnicode
MSDN Library Character encoding Software bugs Microsoft Windows