magic number (programming)
   HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, a magic number is any of the following: * A unique value with unexplained meaning or multiple occurrences which could (preferably) be replaced with a named constant * A constant numerical or text value used to identify a
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
or protocol ) * A distinctive unique value that is unlikely to be mistaken for other meanings (e.g., Universally Unique Identifiers)


Unnamed numerical constants

The term ''magic number'' or ''magic constant'' refers to the
anti-pattern An anti-pattern in software engineering, project management, and business processes is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive. The term, coined in 1995 by computer programmer An ...
of using numbers directly in source code. This breaks one of the oldest rules of programming, dating back to the
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
, FORTRAN and PL/1 manuals of the 1960s. In the following example that computes the price after tax, 1.05 is considered a magic number: price_tax = 1.05 * price The use of unnamed magic numbers in code obscures the developers' intent in choosing that number, increases opportunities for subtle errors, and makes it more difficult for the program to be adapted and extended in the future. As an example, it is difficult to tell whether every digit in 3.14159265358979323846 is correctly typed, or if the constant can be truncated to 3.14159 without affecting the functionality of the program with its reduced precision. Replacing all significant magic numbers with named constants (also called explanatory variables) makes programs easier to read, understand and maintain. The example above can be improved by adding a descriptively named variable: TAX = 0.05 price_tax = (1.0 + TAX) * price Names chosen to be meaningful in the context of the program can result in code that is more easily understood by a maintainer who is not the original author (or even by the original author after a period of time). An example of an uninformatively named constant is int SIXTEEN = 16, while int NUMBER_OF_BITS = 16 is more descriptive. The problems associated with magic 'numbers' described above are not limited to numerical types and the term is also applied to other data types where declaring a named constant would be more flexible and communicative. Thus, declaring const string testUserName = "John" is better than several occurrences of the 'magic value' "John" in a
test suite In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often conta ...
.


Random shuffle example

For example, if it is required to randomly shuffle the values in an array representing a standard pack of
playing cards A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a Pap ...
, this
pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...
does the job using the Fisher–Yates shuffle algorithm: for i from 1 to 52 j := i + randomInt(53 - i) - 1 a.swapEntries(i, j) where a is an array object, the function randomInt(x) chooses a random integer between 1 and ''x'', inclusive, and swapEntries(i, j) swaps the ''i''th and ''j''th entries in the array. In the preceding example, 52 and 53 are magic numbers, also not clearly related to each other. It is considered better programming style to write the following: ''int'' deckSize:= 52 for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j) This is preferable for several reasons: * Better readability. A programmer reading the first example might wonder, ''What does the number 52 mean here? Why 52?'' The programmer might infer the meaning after reading the code carefully, but it is not obvious. Magic numbers become particularly confusing when the same number is used for different purposes in one section of code. * Easier to maintain. It is easier to alter the value of the number, as it is not duplicated. Changing the value of a magic number is error-prone, because the same value is often used several times in different places within a program. Also, when two semantically distinct variables or numbers have the same value they may be accidentally both edited together. To modify the first example to shuffle a
Tarot Tarot (, first known as ''trionfi (cards), trionfi'' and later as ''tarocchi'' or ''tarocks'') is a set of playing cards used in tarot games and in fortune-telling or divination. From at least the mid-15th century, the tarot was used to play t ...
deck, which has 78 cards, a programmer might naively replace every instance of 52 in the program with 78. This would cause two problems. First, it would miss the value 53 on the second line of the example, which would cause the algorithm to fail in a subtle way. Second, it would likely replace the characters "52" everywhere, regardless of whether they refer to the deck size or to something else entirely, such as the number of weeks in a Gregorian calendar year, or more insidiously, are part of a number like "1523", all of which would introduce bugs. By contrast, changing the value of the deckSize variable in the second example would be a simple, one-line change. * Encourages documentation. The single place where the named variable is declared makes a good place to document what the value means and why it has the value it does. Having the same value in a plethora of places either leads to duplicate comments (and attendant problems when updating some but missing some) or leaves no ''one'' place where it's both natural for the author to explain the value and likely the reader shall look for an explanation. * Coalesces information. The declarations of "magic number" variables can be placed together, usually at the top of a function or file, facilitating their review and change. * Detects typos. Using a variable (instead of a literal) takes advantage of a compiler's checking. Accidentally typing "62" instead of "52" would go undetected, whereas typing "dekSize" instead of "deckSize" would result in the compiler's warning that dekSize is undeclared. * Reduces typings. If a IDE supports
code completion Code completion is an autocompletion feature in many integrated development environments (IDEs) that speeds up the process of coding applications by fixing common mistakes and suggesting lines of code. This usually happens through popups while typ ...
, it will fill in most of the variable's name from the first few letters. * Facilitates parameterization. For example, to generalize the above example into a procedure that shuffles a deck of any number of cards, it would be sufficient to turn deckSize into a parameter of that procedure, whereas the first example would require several changes. function shuffle (int deckSize) for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j) Disadvantages are: * Breaks locality. When the named constant is not defined near its use, it hurts the locality, and thus comprehensibility, of the code. Putting the 52 in a possibly distant place means that, to understand the workings of the "for" loop completely (for example to estimate the run-time of the loop), one must track down the definition and verify that it is the expected number. This is easy to avoid (by relocating the declaration) when the constant is only used in one portion of the code. When the named constant is used in disparate portions, on the other hand, the remote location is a clue to the reader that the same value appears in other places in the code, which may also be worth looking into. * Causes verbosity. The declaration of the constant adds a line. When the constant's name is longer than the value's, particularly if several such constants appear in one line, it may make it necessary to split one logical statement of the code across several lines. An increase in verbosity may be justified when there is some likelihood of confusion about the constant, or when there is a likelihood the constant may need to be changed, such as
reuse Reuse is the action or practice of using an item, whether for its original purpose (conventional reuse) or to fulfill a different function (creative reuse or repurposing). It should be distinguished from recycling, which is the breaking down of ...
of a shuffling routine for other card games. It may equally be justified as an increase in expressiveness. * Performance considerations. It may be slower to process the expression deckSize + 1 at run-time than the value "53". That being said, most modern compilers will use techniques like constant folding and
loop optimization In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capa ...
to resolve the addition during compilation, so there is usually no or negligible speed penalty compared to using magic numbers in code. Especially the cost of debugging and the time needed trying to understand non-explanatory code must be held against the tiny calculation cost.


Accepted uses

In some contexts, the use of unnamed numerical constants is generally accepted (and arguably "not magic"). While such acceptance is subjective, and often depends on individual coding habits, the following are common examples: * the use of 0 and 1 as initial or incremental values in a
for loop In computer science, a for-loop or for loop is a control flow Statement (computer science), statement for specifying iteration. Specifically, a for-loop functions by running a section of code repeatedly until a certain condition has been satisfi ...
, such as * the use of 2 to check whether a number is even or odd, as in isEven = (x % 2

0)
, where % is the
modulo In computing and mathematics, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another, the latter being called the '' modulus'' of the operation. Given two positive numbers and , mo ...
operator * the use of simple arithmetic constants, e.g., in expressions such as circumference = 2 * Math.PI * radius, or for calculating the
discriminant In mathematics, the discriminant of a polynomial is a quantity that depends on the coefficients and allows deducing some properties of the zero of a function, roots without computing them. More precisely, it is a polynomial function of the coef ...
of a
quadratic equation In mathematics, a quadratic equation () is an equation that can be rearranged in standard form as ax^2 + bx + c = 0\,, where the variable (mathematics), variable represents an unknown number, and , , and represent known numbers, where . (If and ...
as d = b^2 − 4*a*c * the use of powers of 10 to convert metric values (e.g. between grams and kilograms) or to calculate percentage and
per mille The phrase per mille () indicates parts per thousand. The associated symbol is , similar to a per cent sign but with an extra zero in the division (mathematics), divisor. Major dictionaries do not agree on the spelling, giving other options o ...
values * exponents in expressions such as (f(x) ** 2 + f(y) ** 2) ** 0.5 for \sqrt The constants 1 and 0 are sometimes used to represent the Boolean values true and false in programming languages without a Boolean type, such as older versions of C. Most modern programming languages provide a boolean or bool primitive type and so the use of 0 and 1 is ill-advised. This can be more confusing since 0 sometimes means programmatic success (when -1 means failure) and failure in other cases (when 1 means success). In C and C++, 0 represents the
null pointer In computing, a null pointer (sometimes shortened to nullptr or null) or null reference is a value saved for indicating that the Pointer (computer programming), pointer or reference (computer science), reference does not refer to a valid Object (c ...
. As with Boolean values, the C standard library includes a macro definition NULL whose use is encouraged. Other languages provide a specific null or nil value and when this is the case no alternative should be used. The typed pointer constant nullptr has been introduced with C++11.


Format indicators


Origin

Format indicators were first used in early
Version 7 Unix Version 7 Unix, also called Seventh Edition Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commerc ...
source code.
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
was ported to one of the first DEC
PDP-11 The PDP–11 is a series of 16-bit minicomputers originally sold by Digital Equipment Corporation (DEC) from 1970 into the late 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of a ...
/20s, which did not have
memory protection Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
. So early versions of Unix used the relocatable memory reference model. Pre- Sixth Edition Unix versions read an executable file into
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
and jumped to the first low memory address of the program, relative address zero. With the development of paged versions of Unix, a header was created to describe the executable image components. Also, a branch instruction was inserted as the first word of the header to skip the header and start the program. In this way a program could be run in the older relocatable memory reference (regular) mode or in paged mode. As more executable formats were developed, new constants were added by incrementing the branch offset. In the Sixth Edition
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
of the Unix program loader, the exec() function read the executable ( binary) image from the file system. The first 8
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s of the file was a header containing the sizes of the program (text) and initialized (global) data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained relocatable memory references (normal), the newly implemented paged read-only executable image, or the separated instruction and data paged image. There was no mention of the dual role of the header constant, but the high order byte of the constant was, in fact, the operation code for the PDP-11 branch instruction (
octal Octal (base 8) is a numeral system with eight as the base. In the decimal system, each place is a power of ten. For example: : \mathbf_ = \mathbf \times 10^1 + \mathbf \times 10^0 In the octal system, each place is a power of eight. For ex ...
000407 or hex 0107). Adding seven to the program counter showed that if this constant was executed, it would branch the Unix exec() service over the executable image eight byte header and start the program. Since the Sixth and Seventh Editions of Unix employed paging code, the dual role of the header constant was hidden. That is, the exec() service read the executable file header ( meta) data into a
kernel space A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware prote ...
buffer, but read the executable image into
user space A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware prote ...
, thereby not using the constant's branching feature. Magic number creation was implemented in the Unix
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
and loader and magic number branching was probably still used in the suite of stand-alone
diagnostic program A diagnostic program (also known as a test mode) is an automatic computer program sequence that determines the operational status within the software, hardware, or any combination thereof in a component, a system, or a network of systems. Diagno ...
s that came with the Sixth and Seventh Editions. Thus, the header constant did provide an illusion and met the criteria for magic. In Version Seven Unix, the header constant was not tested directly, but assigned to a variable labeled ux_mag and subsequently referred to as the magic number. Probably because of its uniqueness, the term magic number came to mean executable format type, then expanded to mean file system type, and expanded again to mean any type of file.


In files

Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of
in-band signaling In telecommunications, in-band signaling is the sending of control information within the same band or channel used for data such as voice or video. This is in contrast to out-of-band signaling which is sent over a different channel, or even o ...
to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
s and can yield further run-time
information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
. ;Examples * Compiled
Java class file A Java class file is a file (with the filename extension) containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files ...
s (
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
) and Mach-O binaries start with hex CAFEBABE. When compressed with Pack200 the bytes are changed to CAFED00D. *
GIF The Graphics Interchange Format (GIF; or , ) is a Raster graphics, bitmap Image file formats, image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released ...
image files have the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
code for "GIF89a" (47 49 46 38 39 61) or "GIF87a" (47 49 46 38 37 61) *
JPEG JPEG ( , short for Joint Photographic Experts Group and sometimes retroactively referred to as JPEG 1) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degr ...
image files begin with FF D8 and end with FF D9. JPEG/
JFIF The JPEG File Interchange Format (JFIF) is an image file format standard published as ITU-T Recommendation T.871 and ISO/IEC 10918-5. It defines supplementary specifications for the Digital container format, container format that contains the image ...
files contain the null terminated string "JFIF" (4A 46 49 46 00). JPEG/
Exif Exchangeable image file format (officially Exif, according to JEIDA/JEITA/CIPA specifications) is a standard that specifies formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other system ...
files contain the null terminated string "Exif" (45 78 69 66 00), followed by more
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
about the file. * PNG image files begin with an 8-
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
signature which identifies the file as a PNG file and allows detection of common file transfer problems: "\211PNG\r\n\032\n" (89 50 4E 47 0D 0A 1A 0A). That signature contains various
newline A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or ...
characters to permit detecting unwarranted automated newline conversions, such as transferring the file using
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and dat ...
with the ''ASCII'' transfer mode instead of the ''binary'' mode. * Standard
MIDI Musical Instrument Digital Interface (; MIDI) is an American-Japanese technical standard that describes a communication protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, ...
audio files have the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
code for "MThd" (MIDI Track header, 4D 54 68 64) followed by more metadata. *
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
or
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
scripts may start with a shebang ("#!", 23 21) followed by the path to an interpreter, if the interpreter is likely to be different from the one from which the script was invoked. *
ELF An elf (: elves) is a type of humanoid supernatural being in Germanic peoples, Germanic folklore. Elves appear especially in Norse mythology, North Germanic mythology, being mentioned in the Icelandic ''Poetic Edda'' and the ''Prose Edda'' ...
executables start with the byte 7F followed by "ELF" (7F 45 4C 46). *
PostScript PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
files and programs start with "%!" (25 21). *
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
files start with "%PDF" (hex 25 50 44 46). * DOS MZ executable files and the EXE stub of the
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
PE (Portable Executable) files start with the characters "MZ" (4D 5A), the initials of the designer of the file format, Mark Zbikowski. The definition allows the uncommon "ZM" (5A 4D) as well for dosZMXP, a non-PE EXE. * The Berkeley Fast File System superblock format is identified as either 19 54 01 19 or 01 19 54 depending on version; both represent the birthday of the author, Marshall Kirk McKusick. * The
Master Boot Record A master boot record (MBR) is a type of boot sector in the first block of disk partitioning, partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept ...
of bootable storage devices on almost all
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...
IBM PC compatible An IBM PC compatible is any personal computer that is hardware- and software-compatible with the IBM Personal Computer (IBM PC) and its subsequent models. Like the original IBM PC, an IBM PC–compatible computer uses an x86-based central p ...
s has a code of 55 AA as its last two bytes. * Executables for the
Game Boy The is a handheld game console developed by Nintendo, launched in the Japanese home market on April 21, 1989, followed by North America later that year and other territories from 1990 onwards. Following the success of the Game & Watch single-ga ...
and
Game Boy Advance The (GBA) is a 32-bit handheld game console, manufactured by Nintendo, which was released in Japan on March 21, 2001, and to international markets that June. It was later released in mainland China in 2004, under the name iQue Game Boy Advanc ...
handheld video game systems have a 48-byte or 156-byte magic number, respectively, at a fixed spot in the header. This magic number encodes a bitmap of the
Nintendo is a Japanese Multinational corporation, multinational video game company headquartered in Kyoto. It develops, publishes, and releases both video games and video game consoles. The history of Nintendo began when craftsman Fusajiro Yamauchi ...
logo. *
Amiga Amiga is a family of personal computers produced by Commodore International, Commodore from 1985 until the company's bankruptcy in 1994, with production by others afterward. The original model is one of a number of mid-1980s computers with 16-b ...
software executable Hunk files running on Amiga classic
68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
machines all started with the hexadecimal number $000003f3, nicknamed the "Magic Cookie." * In the Amiga, the only absolute address in the system is hex $0000 0004 (memory location 4), which contains the start location called SysBase, a pointer to exec.library, the so-called kernel of Amiga. * PEF files, used by the
classic Mac OS Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Mac (computer), Macintosh family of personal computers by Apple Computer, Inc. from 1984 to 2001, starting with System 1 and end ...
and BeOS for
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
executables, contain the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
code for "Joy!" (4A 6F 79 21) as a prefix. *
TIFF Tag Image File Format or Tagged Image File Format, commonly known by the abbreviations TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is w ...
files begin with either "II" or "MM" followed by 42 as a two-byte integer in little or big endian byte ordering. "II" is for Intel, which uses little endian byte ordering, so the magic number is 49 49 2A 00. "MM" is for Motorola, which uses big endian byte ordering, so the magic number is 4D 4D 00 2A. *
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
text files encoded in
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
often start with the Byte Order Mark to detect
endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
(FE FF for big endian and FF FE for little endian). And on
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
text files often start with the UTF-8 encoding of the same character, EF BB BF. *
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
Bitcode files start with "BC" (42 43). * WAD files start with "IWAD" or "PWAD" (for '' Doom''), "WAD2" (for '' Quake'') and "WAD3" (for ''
Half-Life Half-life is a mathematical and scientific description of exponential or gradual decay. Half-life, half life or halflife may also refer to: Film * Half-Life (film), ''Half-Life'' (film), a 2008 independent film by Jennifer Phang * ''Half Life: ...
''). * Microsoft Compound File Binary Format (mostly known as one of the older formats of
Microsoft Office Microsoft Office, MS Office, or simply Office, is an office suite and family of client software, server software, and services developed by Microsoft. The first version of the Office suite, announced by Bill Gates on August 1, 1988, at CO ...
documents) files start with D0 CF 11 E0, which is visually suggestive of the word "DOCFILE0". * Headers in ZIP files often show up in text editors as "PK♥♦" (50 4B 03 04), where "PK" are the initials of
Phil Katz Phillip Walter Katz (November 3, 1962 – April 14, 2000) was a computer programmer best known as the co-creator of the ZIP file format for data compression, and the author of PKZIP, a program for creating zip files that ran under DOS. ...
, author of DOS compression utility
PKZIP PKZIP is a file archiving computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes softwar ...
. * Headers in 7z files begin with "7z" (full magic number: 37 7A BC AF 27 1C). ;Detection The Unix utility program file can read and interpret magic numbers from files, and the file which is used to parse the information is called ''magic''. The Windows utility TrID has a similar purpose.


In protocols

;Examples * The OSCAR protocol, used in AIM/ ICQ, prefixes requests with 2A. * In the RFB protocol used by VNC, a client starts its conversation with a server by sending "RFB" (52 46 42, for "Remote Frame Buffer") followed by the client's protocol version number. * In the SMB protocol used by Microsoft Windows, each SMB request or server reply begins with 'FF 53 4D 42', or "\xFFSMB" at the start of the SMB request. * In the MSRPC protocol used by Microsoft Windows, each TCP-based request begins with 05 at the start of the request (representing Microsoft DCE/RPC Version 5), followed immediately by a 00 or 01 for the minor version. In UDP-based MSRPC requests the first byte is always 04. * In COM and DCOM marshalled interfaces, called OBJREFs, always start with the byte sequence "MEOW" (4D 45 4F 57). Debugging extensions (used for DCOM channel hooking) are prefaced with the byte sequence "MARB" (4D 41 52 42). * Unencrypted
BitTorrent tracker A BitTorrent tracker is a special type of server that assists in the communication between peers using the BitTorrent protocol. In peer-to-peer file sharing, a software client on an end-user PC requests a file, and portions of the requested ...
requests begin with a single byte containing the value 19 representing the header length, followed immediately by the phrase "BitTorrent protocol" at byte position 1. *
eDonkey2000 eDonkey2000 (nicknamed "ed2k") was a peer-to-peer file sharing application developed by US company MetaMachine ( Jed McCaleb and Sam Yagan), using the Multisource File Transfer Protocol. It supported both the eDonkey2000 network and the Over ...
/
eMule eMule is a Free software, free peer-to-peer file sharing application for Microsoft Windows. Started in May 2002 as an alternative to eDonkey2000, eMule connects to both the eDonkey network and the Kad network. The distinguishing features of eM ...
traffic begins with a single byte representing the client version. Currently E3 represents an eDonkey client, C5 represents eMule, and D4 represents compressed eMule. * The first 4 bytes of a block in the
Bitcoin Bitcoin (abbreviation: BTC; Currency symbol, sign: ₿) is the first Decentralized application, decentralized cryptocurrency. Based on a free-market ideology, bitcoin was invented in 2008 when an unknown entity published a white paper under ...
Blockchain contains a magic number which serves as the network identifier. The value is a constant 0xD9B4BEF9, which indicates the main network, while the constant 0xDAB5BFFA indicates the testnet. * SSL transactions always begin with a "client hello" message. The record encapsulation scheme used to prefix all SSL packets consists of two- and three- byte header forms. Typically an SSL version 2 client hello message is prefixed with an 80 and an SSLv3 server response to a client hello begins with 16 (though this may vary). * DHCP packets use a "magic cookie" value of '0x63 0x82 0x53 0x63' at the start of the options section of the packet. This value is included in all DHCP packet types. *
HTTP/2 HTTP/2 (originally named HTTP/2.0) is a major revision of the HTTP network protocol used by the World Wide Web. It was derived from the earlier experimental SPDY protocol, originally developed by Google. HTTP/2 was developed by the HTTP Working ...
connections are opened with the preface '0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a', or "PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n". The preface is designed to avoid the processing of frames by servers and intermediaries which support earlier versions of HTTP but not 2.0. * The WebSocket opening handshake uses the string ''258EAFA5-E914-47DA-95CA-C5AB0DC85B11''.


In interfaces

Magic numbers are common in API functions and interfaces across many
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, including DOS,
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
and
NetWare NetWare is a discontinued computer network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, using the IPX network protocol. The final update release was ver ...
: ;Examples *
IBM PC The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the List of IBM Personal Computer models, IBM PC model line and the basis for the IBM PC compatible ''de facto'' standard. Released on ...
-compatible
BIOS In computing, BIOS (, ; Basic Input/Output System, also known as the System BIOS, ROM BIOS, BIOS ROM or PC BIOS) is a type of firmware used to provide runtime services for operating systems and programs and to perform hardware initialization d ...
es use magic values 0000 and 1234 to decide if the system should count up memory or not on reboot, thereby performing a cold or a warm boot. Theses values are also used by
EMM386 EMM386 is the expanded memory manager of Microsoft's MS-DOS, IBM's PC DOS, Digital Research's DR-DOS, and Datalight's ROM-DOS which is used to create expanded memory using extended memory on Intel 80386 CPUs. There also is an EMM386.EXE avail ...
memory managers intercepting boot requests. BIOSes also use magic values 55 AA to determine if a disk is bootable. * The
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
disk cache SMARTDRV (codenamed "Bambi") uses magic values BABE and EBAB in API functions. * Many
DR-DOS DR-DOS is a disk operating system for IBM PC compatibles, originally developed by Gary A. Kildall's Digital Research, Inc. and derived from Concurrent PC DOS 6.0, which was an advanced successor of CP/M-86. Upon its introduction in 198 ...
, Novell DOS and
OpenDOS DR-DOS is a disk operating system for IBM PC compatibles, originally developed by Gary A. Kildall's Digital Research, Digital Research, Inc. and derived from Concurrent PC DOS 6.0, which was an advanced successor of CP/M-86. Upon its introd ...
drivers developed in the former ''European Development Centre'' in the UK use the value 0EDC as magic token when invoking or providing additional functionality sitting on top of the (emulated) standard DOS functions, NWCACHE being one example.


Other uses

;Examples * The default
MAC address A MAC address (short for medium access control address or media access control address) is a unique identifier assigned to a network interface controller (NIC) for use as a network address in communications within a network segment. This use i ...
on Texas Instruments
SOCs SOCS (suppressor of cytokine signaling proteins) refers to a family of genes involved in inhibiting the JAK-STAT signaling pathway. Genes * CISH (gene), CISH * SOCS1 * SOCS2 * SOCS3 * SOCS4 * SOCS5 * SOCS6 * SOCS7 Structure All SOCS have certai ...
is DE:AD:BE:EF:00:00.


Data type limits

This is a list of limits of data storage types:


GUIDs

It is possible to create or alter
globally unique identifier A Universally Unique Identifier (UUID) is a 128-bit label used to uniquely identify objects in computer systems. The term Globally Unique Identifier (GUID) is also used, mostly in Microsoft systems. When generated according to the standard methods ...
s (GUIDs) so that they are memorable, but this is highly discouraged as it compromises their strength as near-unique identifiers. The specifications for generating GUIDs and UUIDs are quite complex, which is what leads to them being virtually unique, if properly implemented. Microsoft Windows product ID numbers for
Microsoft Office Microsoft Office, MS Office, or simply Office, is an office suite and family of client software, server software, and services developed by Microsoft. The first version of the Office suite, announced by Bill Gates on August 1, 1988, at CO ...
products sometimes end with 0000-0000-0000000FF1CE ("OFFICE"), such as , the product ID for the "Office 16 Click-to-Run Extensibility Component". Java uses several GUIDs starting with CAFEEFAC. In the
GUID Partition Table The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive. It is part of the Unified Extensible Firmware Interface (UEFI) standard. It ha ...
of the GPT partitioning scheme, BIOS Boot partitions use the special GUID which does not follow the GUID definition; instead, it is formed by using the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
codes for the string "Hah!IdontNeedEFI" partially in little endian order.


Debug values

Magic debug values are specific values written to
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted, and to make it obvious when values taken from uninitialized memory are being used. Memory is usually viewed in hexadecimal, so memorable repeating or hexspeak values are common. Numerically odd values may be preferred so that processors without byte addressing will fault when attempting to use them as pointers (which must fall at even addresses). Values should be chosen that are away from likely addresses (the program code, static data, heap data, or the stack). Similarly, they may be chosen so that they are not valid codes in the instruction set for the given architecture. Since it is very unlikely, although possible, that a 32-bit integer would take this specific value, the appearance of such a number in a
debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...
or
memory dump In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise termin ...
most likely indicates an error such as a buffer overflow or an
uninitialized variable In computing, an uninitialized variable is a variable (programming), variable that is declared but is not set to a definite known value before it is used. It will have ''some'' value, but not a predictable one. As such, it is a programming error an ...
. Famous and common examples include: Most of these are 32 bits longthe word size of most 32-bit architecture computers. The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's book ''Writing Solid Code'' from
Microsoft Press Microsoft Press is the publishing arm of Microsoft, usually releasing books dealing with various current Microsoft technologies. Microsoft Press' introduced books were ''The Apple Macintosh Book'' by Cary Lu and ''Exploring the IBM PCjr Home Comp ...
. He gives a variety of criteria for these values, such as: * They should not be useful; that is, most algorithms that operate on them should be expected to do something unusual. Numbers like zero don't fit this criterion. * They should be easily recognized by the programmer as invalid values in the debugger. * On machines that don't have byte alignment, they should be
odd number In mathematics, parity is the property of an integer of whether it is even or odd. An integer is even if it is divisible by 2, and odd if it is not.. For example, −4, 0, and 82 are even numbers, while −3, 5, 23, and 69 are odd numbers. The ...
s, so that dereferencing them as addresses causes an exception. * They should cause an exception, or perhaps even a debugger break, if executed as code. Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".


See also

* Magic string * * List of file signatures * FourCC * Hard coding *
Magic (programming) In the context of computer programming, magic is an informal term for abstraction; it is used to describe code that handles complex tasks while hiding that complexity to present a simple interface. The term is somewhat tongue-in-cheek, and of ...
* NaN (Not a Number) *
Enumerated type In computer programming, an enumerated type (also called enumeration, enum, or factor in the R (programming language), R programming language, a status variable in the JOVIAL programming language, and a categorical variable in statistics) is a data ...
* Hexspeak, for another list of magic values * Nothing up my sleeve number about magic constants in
cryptographic Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adversarial behavior. More gen ...
algorithms *
Time formatting and storage bugs In computer science, data type limitations and software bugs can cause errors in system time, time and date calculation or display. These are most commonly manifestations of arithmetic overflow, but can also be the result of other issues. The bes ...
, for problems that can be caused by magics *
Sentinel value In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically ...
(aka flag value, trip value, rogue value, signal value, dummy data) *
Canary value Canary originally referred to the Spanish island of Gran Canaria in the North Atlantic Ocean and the surrounding Canary Islands. It may also refer to: Animals Birds * Canaries, birds in the genera ''Serinus'' and ''Crithagra'' including, among o ...
, special value to detect buffer overflows * XYZZY (magic word) * Fast inverse square root, an algorithm that uses the constant 0x5F3759DF


References

{{Computer files Anti-patterns Debugging Computer programming folklore Software engineering folklore