Class (file format)
   HOME

TheInfoList



OR:

A Java class file is a file (with the
filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically ...
) containing
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming langua ...
that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from
Java programming language Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers ''write once, run anywh ...
source files ( files) containing Java classes (alternatively, other
JVM languages This list of JVM Languages comprises notable computer programming languages that are used to produce computer software that runs on the Java virtual machine (JVM). Some of these languages are interpreted by a Java program, and some are compiled ...
can also be used to create class files). If a source file has more than one class, each class is compiled into a separate class file. JVMs are available for many platforms, and a class file compiled on one platform will execute on a JVM of another platform. This makes Java applications
platform-independent In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
.


History

On 11 December 2006, the class file format was modified under Java Specification Request (JSR) 202.


File layout and structure


Sections

There are 10 basic sections to the Java class file structure: * Magic Number: 0xCAFEBABE * Version of Class File Format: the minor and major versions of the class file * Constant Pool: Pool of constants for the class * Access Flags: for example whether the class is abstract, static, etc. * This Class: The name of the current class * Super Class: The name of the super class *
Interfaces Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Int ...
: Any interfaces in the class * Fields: Any fields in the class *
Method Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to: *Scien ...
s: Any methods in the class * Attributes: Any attributes of the class (for example the name of the sourcefile, etc.)


Magic Number

Class files are identified by the following 4
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
header (in
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, he ...
): CA FE BA BE (the first 4 entries in the table below). The history of this magic number was explained by
James Gosling James Gosling (born May 19, 1955) is a Canadian computer scientist, best known as the founder and lead designer behind the Java programming language. Gosling was elected a member of the National Academy of Engineering in 2004 for the conception ...
referring to a restaurant in
Palo Alto Palo Alto (; Spanish for "tall stick") is a charter city in the northwestern corner of Santa Clara County, California, United States, in the San Francisco Bay Area, named after a coastal redwood tree known as El Palo Alto. The city was es ...
:James Gosling private communication to Bill Bumgarner
/ref>
"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the
Grateful Dead The Grateful Dead was an American rock band formed in 1965 in Palo Alto, California. The band is known for its eclectic style, which fused elements of rock, folk, country, jazz, bluegrass, blues, rock and roll, gospel, reggae, world music, ...
used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sa ...
ping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."


General layout

Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types: * u1: an unsigned
8-bit In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...
integer * u2: an unsigned
16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
integer in
big-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
byte order * u4: an unsigned
32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32- bit units. Compared to smaller bit widths, 32-bit computers can perform large calculati ...
integer in big-endian byte order * table: an array of variable-length items of some type. The number of items in the table is identified by a preceding count number (the count is a u2), but the size in bytes of the table can only be determined by examining each of its items. Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context. There is no enforcement of word alignment, and so no padding bytes are ever used. The overall layout of the class file is as shown in the following table.


Representation in a C-like programming language

Since C doesn't support multiple variable length arrays within a struct, the code below won't compile and only serves as a demonstration. struct Class_File_Format


The constant pool

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid). Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), but the count should actually be interpreted as the maximum index plus one. Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used. The type of each item (constant) in the constant pool is identified by an initial byte ''tag''. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are: There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant. Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object". The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
for a complete discussion). The first is that the code point U+0000 is encoded as the two-byte sequence C0 80 (in hex) instead of the standard single-byte encoding 00. The second difference is that supplementary characters (those outside the BMP at U+10000 and above) are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example, U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E, rather than the correct 4-byte UTF-8 encoding of F0 9D 84 9E.


See also

*
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming langua ...


References


Further reading

* The official defining document of the
Java Virtual Machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describe ...
, which includes the class file format. Both the first and second editions of the book are freely availabl
online for viewing and/or download
{{DEFAULTSORT:Class (File Format) Computer file formats Java platform