Uninitialized variable
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, an uninitialized variable is a
variable Variable may refer to: * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed * Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
that is declared but is not set to a definite known value before it is used. It will have ''some'' value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.


Example of the C language

A common assumption made by novice programmers is that all variables are set to a known value, such as zero, when they are declared. While this is true for many languages, it is not true for all of them, and so the potential for error is there. Languages such as C use stack space for variables, and the collection of variables allocated for a subroutine is known as a
stack frame In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or mac ...
. While the computer will set aside the appropriate amount of space for the stack frame, it usually does so simply by adjusting the value of the stack pointer, and does not set the
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
itself to any new state (typically out of efficiency concerns). Therefore, whatever contents of that memory at the time will appear as initial values of the variables which occupy those addresses. Here's a simple example in C: void count( void ) The final value of k is undefined. The answer that it must be 10 assumes that it started at zero, which may or may not be true. Note that in the example, the variable i is initialized to zero by the first clause of the for statement. Another example can be when dealing with
struct In computer science, a record (also called a structure, struct, or compound data) is a basic data structure. Records in a database or spreadsheet are usually called "rows". A record is a collection of '' fields'', possibly of different data typ ...
s. In the code snippet below, we have a struct student which contains some variables describing the information about a student. The function register_student leaks memory because it fails to fully initialize the members of struct student new_student. If we take a closer look, in the beginning, age, semester and student_number are initialized. But the initialization of the first_name and last_name members are incorrect. This is because if the length of first_name and last_name character arrays are less than 16 bytes, during the strcpy, we fail to fully initialize the entire 16 bytes of memory reserved for each of these members. Hence after memcpy()'ing the resulted struct to output, we leak some stack memory to the caller. struct student ; int register_student(struct student *output, int age, char *first_name, char *last_name) In any case, even when a variable is ''implicitly'' initialized to a ''default'' value like 0, this is typically not the ''correct'' value. Initialized does not mean correct if the value is a default one. (However, default initialization to 0 is a right practice for pointers and arrays of pointers, since it makes them invalid before they are actually initialized to their correct value.) In C, variables with static storage duration that are not initialized explicitly are initialized to zero (or null, for pointers). Not only are uninitialized variables a frequent cause of bugs, but this kind of bug is particularly serious because it may not be reproducible: for instance, a variable may remain uninitialized only in some
branch A branch, sometimes called a ramus in botany, is a woody structural member connected to the central trunk (botany), trunk of a tree (or sometimes a shrub). Large branches are known as boughs and small branches are known as twigs. The term '' ...
of the program. In some cases, programs with uninitialized variables may even pass
software test Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...
s.


Impacts

Uninitialized variables are powerful bugs since they can be exploited to leak arbitrary memory or to achieve arbitrary memory overwrite or to gain code execution, depending on the case. When exploiting a software which utilizes
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited fu ...
(ASLR), it is often required to know the
base address In computing, a base address is an address serving as a reference point ("base") for other addresses. Related addresses can be accessed using an ''addressing scheme''. Under the ''relative addressing'' scheme, to obtain an absolute address, the ...
of the software in memory. Exploiting an uninitialized variable in a way to force the software to leak a pointer from its
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve st ...
can be used to bypass ASLR.


Use in languages

Uninitialized variables are a particular problem in languages such as assembly language, C, and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, which were designed for
systems programming Systems programming, or system programming, is the activity of programming computer system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to pr ...
. The development of these languages involved a design philosophy in which conflicts between performance and safety were generally resolved in favor of performance. The programmer was given the burden of being aware of dangerous issues such as uninitialized variables. In other languages, variables are often initialized to known values when created. Examples include: *
VHDL The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gates ...
initializes all standard variables into special 'U' value. It is used in simulation, for debugging, to let the user to know when the
don't care In digital logic, a don't-care term (abbreviated DC, historically also known as ''redundancies'', ''irrelevancies'', ''optional entries'', ''invalid combinations'', ''vacuous combinations'', ''forbidden combinations'', ''unused states'' or ''l ...
initial values, through the
multi-valued logic Many-valued logic (also multi- or multiple-valued logic) refers to a propositional calculus in which there are more than two truth values. Traditionally, in Aristotle's logical calculus, there were only two possible values (i.e., "true" and "false ...
, affect the output. *
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
does not have uninitialized variables. Fields of classes and objects that do not have an explicit initializer and elements of arrays are automatically initialized with the default value for their type (false for boolean, 0 for all numerical types, null for all reference types). Local variables in Java must be definitely assigned to before they are accessed, or it is a compile error. *
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
initializes local variables to NULL (distinct from None) and raises an UnboundLocalError when such a variable is accessed before being (re)initialized to a valid value. * D initializes all variables unless explicitly specified by the programmer not to. Even in languages where uninitialized variables are allowed, many
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s will attempt to identify the use of uninitialized variables and report them as
compile-time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
error An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
s.


See also

*
Initialization (programming) In computer programming, initialization (or initialisation) is the assignment of an initial value for a data object or variable. The manner in which initialization is performed depends on the programming language, as well as the type, storage class, ...
*
Null pointer In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown lengt ...
*
Don't care In digital logic, a don't-care term (abbreviated DC, historically also known as ''redundancies'', ''irrelevancies'', ''optional entries'', ''invalid combinations'', ''vacuous combinations'', ''forbidden combinations'', ''unused states'' or ''l ...
*
Undefined behaviour In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior ...


References


Further reading

* {{cite web , title=CWE-457 Use of Uninitialized Variable , url=http://cwe.mitre.org/data/definitions/457.html Software bugs Variable (computer science)