In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a type punning is any programming technique that subverts or circumvents the
type system
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
of a
programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.
In
C and
C++, constructs such as
pointer type conversion and
union
— C++ adds
reference
Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. It is called a '' name'' ...
type conversion and
reinterpret_cast
to this list — are provided in order to permit many kinds of type punning, although some kinds are not actually supported by the standard language.
In the
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Frenc ...
programming language, the use of
records with
variants may be used to treat a particular data type in more than one manner, or in a manner not normally permitted.
Sockets example
One classic example of type punning is found in the
Berkeley sockets
Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BS ...
interface. The function to bind an opened but uninitialized socket to an
IP address is declared as follows:
int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
The
bind
function is usually called as follows:
struct sockaddr_in sa = ;
int sockfd = ...;
sa.sin_family = AF_INET;
sa.sin_port = htons(port);
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);
The Berkeley sockets library fundamentally relies on the fact that in
C, a pointer to
struct sockaddr_in
is freely convertible to a pointer to
struct sockaddr
; and, in addition, that the two structure types share the same memory layout. Therefore, a reference to the structure field
my_addr->sin_family
(where
my_addr
is of type
struct sockaddr*
) will actually refer to the field
sa.sin_family
(where
sa
is of type
struct sockaddr_in
). In other words, the sockets library uses type punning to implement a rudimentary form of
polymorphism or
inheritance
Inheritance is the practice of receiving private property, titles, debts, entitlements, privileges, rights, and obligations upon the death of an individual. The rules of inheritance differ among societies and have changed over time. Of ...
.
Often seen in the programming world is the use of "padded" data structures to allow for the storage of different kinds of values in what is effectively the same storage space. This is often seen when two structures are used in mutual exclusivity for optimization.
Floating-point example
Not all examples of type punning involve structures, as the previous example did. Suppose we want to determine whether a
floating-point number is negative. We could write:
bool is_negative(float x)
However, supposing that floating-point comparisons are expensive, and also supposing that
float
is represented according to the
IEEE floating-point standard
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
, and integers are 32 bits wide, we could engage in type punning to extract the
sign bit of the floating-point number using only integer operations:
bool is_negative(float x)
Note that the behaviour will not be exactly the same: in the special case of
x
being
negative zero, the first implementation yields
false
while the second yields
true
. Also, the first implementation will return
false
for any
NaN value, but the latter might return
true
for NaN values with the sign bit set.
This kind of type punning is more dangerous than most. Whereas the former example relied only on guarantees made by the C programming language about structure layout and pointer convertibility, the latter example relies on assumptions about a particular system's hardware. Some situations, such as
time-critical code that the compiler otherwise fails to
optimize, may require dangerous code. In these cases, documenting all such assumptions in
comment
Comment may refer to:
* Comment (linguistics) or rheme, that which is said about the topic (theme) of a sentence
* Bernard Comment (born 1960), Swiss writer and publisher
Computing
* Comment (computer programming), explanatory text or informat ...
s, and introducing
static assertions to verify portability expectations, helps to keep the code
maintainable.
Practical examples of floating-point punning include
fast inverse square root popularized by
Quake III, fast FP comparison as integers, and finding neighboring values by incrementing as an integer (implementing ).
By language
C and C++
In addition to the assumption about bit-representation of floating-point numbers, the above floating-point type-punning example also violates the C language's constraints on how objects are accessed:
[ISO/IEC 9899:1999 s6.5/7] the declared type of
x
is
float
but it is read through an expression of type
unsigned int
. On many common platforms, this use of pointer punning can create problems if different pointers are
aligned in machine-specific ways. Furthermore, pointers of different sizes can
alias accesses to the same memory, causing problems that are unchecked by the compiler. Even when data size and pointer representation match, however, compilers can rely on the non-aliasing constraints to perform optimizations that would be unsafe in the presence of disallowed aliasing.
Use of pointers
A naive attempt at type-punning can be achieved by using pointers:
float pi = 3.14159;
uint32_t piAsRawData = *(uint32_t*)π
According to the C standard, this code should not (or rather, does not have to) compile, however, if it does, then
piAsRawData
typically contains the raw bits of pi.
Use of union
In C, but not in C++, it is sometimes possible to perform type punning via a
union
. (The following example assumes IEEE-754 bit-representation for type
float
.)
bool is_negative(float x)
Accessing
my_union.ui
after most recently writing to the other member,
my_union.d
, is an allowed form of type-punning in C, provided that the member read is not larger than the one whose value was set (otherwise the read has
unspecified behavior). The same is syntactically valid but has
undefined behavior in C++,
[ISO/IEC 14882:2011 Section 9.5] however, where only the last-written member of a
union
is considered to have any value at all.
For another example of type punning, see
Stride of an array.
Pascal
A variant record permits treating a data type as multiple kinds of data depending on which variant is being referenced. In the following example, ''integer'' is presumed to be 16 bit, while ''longint'' and ''real'' are presumed to be 32, while character is presumed to be 8 bit:
type
VariantRecord = record
case RecType : LongInt of
1: (I : array ..2of Integer); (* not show here: there can be several variables in a variant record's case statement *)
2: (L : LongInt );
3: (R : Real );
4: (C : array ..4of Char );
end;
var
V : VariantRecord;
K : Integer;
LA : LongInt;
RA : Real;
Ch : Character;
V.I := 1;
Ch := V.C (* this would extract the first byte of V.I *)
V.R := 8.3;
LA := V.L; (* this would store a Real into an Integer *)
In Pascal, copying a real to an integer converts it to the truncated value. This method would translate the binary value of the floating-point number into whatever it is as a long integer (32 bit), which will not be the same and may be incompatible with the long integer value on some systems.
These examples could be used to create strange conversions, although, in some cases, there may be legitimate uses for these types of constructs, such as for determining locations of particular pieces of data. In the following example a pointer and a longint are both presumed to be 32 bit:
type
PA = ^Arec;
Arec = record
case RT : LongInt of
1: (P : PA );
2: (L : LongInt);
end;
var
PP : PA;
K : LongInt;
New(PP);
PP^.P := PP;
WriteLn('Variable PP is located at address ', Hex(PP^.L));
Where "new" is the standard routine in Pascal for allocating memory for a pointer, and "hex" is presumably a routine to print the hexadecimal string describing the value of an integer. This would allow the display of the address of a pointer, something which is not normally permitted. (Pointers cannot be read or written, only assigned.) Assigning a value to an integer variant of a pointer would allow examining or writing to any location in system memory:
PP^.L := 0;
PP := PP^.P; (* PP now points to address 0 *)
K := PP^.L; (* K contains the value of word 0 *)
WriteLn('Word 0 of this machine contains ', K);
This construct may cause a program check or protection violation if address 0 is protected against reading on the machine the program is running upon or the operating system it is running under.
The reinterpret cast technique from C/C++ also works in Pascal. This can be useful, when eg. reading dwords from a byte stream, and we want to treat them as float. Here is a working example, where we reinterpret-cast a dword to a float:
type
pReal = ^Real;
var
DW : DWord;
F : Real;
F := pReal(@DW)^;
C#
In
C# (and other .NET languages), type punning is a little harder to achieve because of the type system, but can be done nonetheless, using pointers or struct unions.
Pointers
C# only allows pointers to so-called native types, i.e. any primitive type (except
string
), enum, array or struct that is composed only of other native types. Note that pointers are only allowed in code blocks marked 'unsafe'.
float pi = 3.14159;
uint piAsRawData = *(uint*)π
Struct unions
Struct unions are allowed without any notion of 'unsafe' code, but they do require the definition of a new type.
tructLayout(LayoutKind.Explicit)struct FloatAndUIntUnion
// ...
FloatAndUIntUnion union;
union.DataAsFloat = 3.14159;
uint piAsRawData = union.DataAsUInt;
Raw CIL code
Raw
CIL can be used instead of C#, because it doesn't have most of the type limitations. This allows one to, for example, combine two enum values of a generic type:
TEnum a = ...;
TEnum b = ...;
TEnum combined = a , b; // illegal
This can be circumvented by the following CIL code:
.method public static hidebysig
!!TEnum CombineEnumsscorlibystem.ValueType) TEnum>(
!!TEnum a,
!!TEnum b
) cil managed
The
cpblk
CIL opcode allows for some other tricks, such as converting a struct to a byte array:
.method public static hidebysig
uint8[] ToByteArrayscorlibystem.ValueType) T>(
!!T& v // 'ref T' in C#
) cil managed
{
.locals init (
uint8[]
)
.maxstack 3
// create a new byte array with length sizeof(T) and store it in local 0
sizeof !!T
newarr uint8
dup // keep a copy on the stack for later (1)
stloc.0
ldc.i4.0
ldelema uint8
// memcpy(local 0, &v, sizeof(T));
//
ldarg.0 // this is the *address* of 'v', because its type is '!!T&'
sizeof !!T
cpblk
ldloc.0
ret
}
References
External links
Sectionof the
GCC manual on
-fstrict-aliasing
, which defeats some type punning
Defect Report 257to the
C99 standard, incidentally defining "type punning" in terms of
union
, and discussing the issues surrounding the implementation-defined behavior of the last example above
Defect Report 283on the use of unions for type punning
Programming constructs
Articles with example C code
Articles with example Pascal code