HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the
instruction path length In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performa ...
and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a
buffer overflow In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations. Buffers are areas of memo ...
. Self-modifying code can involve overwriting existing instructions or generating new code at run time and transferring control to that code. Self-modification can be used as an alternative to the method of "flag setting" and conditional program branching, used primarily to reduce the number of times a condition needs to be tested. The method is frequently used for conditionally invoking test/debugging code without requiring additional
computational overhead In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform a specific task. It is a special case of engineering overhead. Overhead can be a decidi ...
for every
input/output In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...
cycle. The modifications may be performed: * only during initialization – based on input
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s (when the process is more commonly described as software ' configuration' and is somewhat analogous, in hardware terms, to setting
jumpers Jumper or Jumpers may refer to: Clothing * Jumper (sweater), a long-sleeve article of clothing; also called a top, pullover, or sweater **A waist-length top garment of dense wool, part of the Royal Navy uniform and the uniform of the United Stat ...
for printed circuit boards). Alteration of program entry pointers is an equivalent indirect method of self-modification, but requiring the co-existence of one or more alternative instruction paths, increasing the program size. * throughout execution ("on the fly") – based on particular program states that have been reached during the execution In either case, the modifications may be performed directly to the
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
instructions themselves, by
overlaying Overlaying or overlying is the act of accidentally smothering a child to death by rolling over them in sleep. Alleged instances of overlaying were perceived to be one common way of covering up infanticide in Victorian England. Many wet nurses wer ...
new instructions over the existing ones (for example: altering a compare and branch to an
unconditional branch A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
or alternatively a ' NOP'). In the
IBM System/360 architecture The IBM System/360 architecture is the model independent architecture for the entire S/360 line of mainframe computers, including but not limited to the instruction set architecture. The elements of the architecture are documented in the '' ...
, and its successors up to
z/Architecture z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architecture ...
, an EXECUTE (EX) instruction ''logically'' overlays the second byte of its target instruction with the low-order 8 bits of
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), th ...
1. This provides the effect of self-modification although the actual instruction in storage is not altered.


Application in low and high level languages

Self-modification can be accomplished in a variety of ways depending upon the programming language and its support for pointers and/or access to dynamic compiler or interpreter 'engines': * overlay of existing instructions (or parts of instructions such as opcode, register, flags or addresses) or * direct creation of whole instructions or sequences of instructions in memory * creating or modification of
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
statements followed by a 'mini compile' or a dynamic interpretation (see
eval In some programming languages, eval , short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had ...
statement) * creating an entire program dynamically and then executing it


Assembly language

Self-modifying code is quite straightforward to implement when using assembly language. Instructions can be dynamically created in
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
(or else overlaid over existing code in non-protected program storage), in a sequence equivalent to the ones that a standard compiler may generate as the
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
. With modern processors, there can be unintended
side effect In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequence ...
s on the
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
that must be considered. The method was frequently used for testing 'first time' conditions, as in this suitably commented
IBM/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of ...
example. It uses instruction overlay to reduce the
instruction path length In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performa ...
by (N×1)−1 where N is the number of records on the file (−1 being the overhead to perform the overlay). SUBRTN NOP OPENED FIRST TIME HERE? * The NOP is x'4700' OI SUBRTN+1,X'F0' YES, CHANGE NOP TO UNCONDITIONAL BRANCH (47F0...) OPEN INPUT AND OPEN THE INPUT FILE SINCE IT'S THE FIRST TIME THRU OPENED GET INPUT NORMAL PROCESSING RESUMES HERE ... Alternative code might involve testing a "flag" each time through. The unconditional branch is slightly faster than a compare instruction, as well as reducing the overall path length. In later operating systems for programs residing in protected storage this technique could not be used and so changing the pointer to the subroutine would be used instead. The pointer would reside in dynamic storage and could be altered at will after the first pass to bypass the OPEN (having to load a pointer first instead of a direct branch & link to the subroutine would add N instructions to the path length – but there would be a corresponding reduction of N for the unconditional branch that would no longer be required). Below is an example in
Zilog Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
assembly language. The code increments register "B" in range ,5 The "CP" compare instruction is modified on each loop. ;

ORG 0H CALL FUNC00 HALT ;

FUNC00: LD A,6 LD HL,label01+1 LD B,(HL) label00: INC B LD (HL),B label01: CP $0 JP NZ,label00 RET ;

Self-modifying code is sometimes used to overcome limitations in a machine's instruction set. For example, in the
Intel 8080 The Intel 8080 (''"eighty-eighty"'') is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1974 and is an extended and enhanced variant of the earlier 8008 design, although without binary compatibil ...
instruction set, one cannot input a byte from an input port that is specified in a register. The input port is statically encoded in the instruction itself, as the second byte of a two byte instruction. Using self-modifying code, it is possible to store a register's contents into the second byte of the instruction, then execute the modified instruction in order to achieve the desired effect.


High-level languages

Some compiled languages explicitly permit self-modifying code. For example, the ALTER verb in COBOL may be implemented as a branch instruction that is modified during execution. Some
batch Batch may refer to: Food and drink * Batch (alcohol), an alcoholic fruit beverage * Batch loaf, a type of bread popular in Ireland * A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
programming techniques involve the use of self-modifying code. Clipper and
SPITBOL SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 programming language. Originally targeted for the IBM System/360 and System/370 family of computers, it has now been ported to most major microprocessors inclu ...
also provide facilities for explicit self-modification. The Algol compiler on B6700 systems offered an interface to the operating system whereby executing code could pass a text string or a named disc file to the Algol compiler and was then able to invoke the new version of a procedure. With interpreted languages, the "machine code" is the source text and may be susceptible to editing on-the-fly: in
SNOBOL SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of ...
the source statements being executed are elements of a text array. Other languages, such as
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, allow programs to create new code at run-time and execute it using an
eval In some programming languages, eval , short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had ...
function, but do not allow existing code to be mutated. The illusion of modification (even though no machine code is really being overwritten) is achieved by modifying function pointers, as in this JavaScript example: var f = function (x) ; // assign a new definition to f: f = new Function('x', 'return x + 2');
Lisp macros In computer programming, a macro (short for "macro instruction"; ) is a rule or pattern that specifies how a certain input should be Map (mathematics), mapped to a replacement output. Applying a macro to an input is known as macro expansion. Th ...
also allow runtime code generation without parsing a string containing program code. The Push programming language is a
genetic programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
system that is explicitly designed for creating self-modifying programs. While not a high level language, it is not as low level as assembly language.


Compound modification

Prior to the advent of multiple windows, command-line systems might offer a menu system involving the modification of a running command script. Suppose a DOS script (or "batch") file MENU.BAT contains the following: :start SHOWMENU.EXE Upon initiation of MENU.BAT from the command line, SHOWMENU presents an on-screen menu, with possible help information, example usages and so forth. Eventually the user makes a selection that requires a command ''SOMENAME'' to be performed: SHOWMENU exits after rewriting the file MENU.BAT to contain :start SHOWMENU.EXE CALL ''SOMENAME''.BAT GOTO start Because the DOS command interpreter does not compile a script file and then execute it, nor does it read the entire file into memory before starting execution, nor yet rely on the content of a record buffer, when SHOWMENU exits, the command interpreter finds a new command to execute (it is to invoke the script file ''SOMENAME'', in a directory location and via a protocol known to SHOWMENU), and after that command completes, it goes back to the start of the script file and reactivates SHOWMENU ready for the next selection. Should the menu choice be to quit, the file would be rewritten back to its original state. Although this starting state has no use for the label, it, or an equivalent amount of text is required, because the DOS command interpreter recalls the byte position of the next command when it is to start the next command, thus the re-written file must maintain alignment for the next command start point to indeed be the start of the next command. Aside from the convenience of a menu system (and possible auxiliary features), this scheme means that the SHOWMENU.EXE system is not in memory when the selected command is activated, a significant advantage when memory is limited.


Control tables

Control table Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some w ...
interpreters can be considered to be, in one sense, 'self-modified' by data values extracted from the table entries (rather than specifically hand coded in conditional statements of the form "IF inputx = 'yyy'").


Channel programs

Some IBM access methods traditionally used self-modifying channel programs, where a value, such as a disk address, is read into an area referenced by a channel program, where it is used by a later channel command to access the disk.


History

The IBM SSEC, demonstrated in January 1948, had the ability to modify its instructions or otherwise treat them exactly like data. However, the capability was rarely used in practice. In the early days of computers, self-modifying code was often used to reduce use of limited memory, or improve performance, or both. It was also sometimes used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
. This use is still relevant in certain ultra- RISC architectures, at least theoretically; see for example
one instruction set computer A one-instruction set computer (OISC), sometimes called an ultimate reduced instruction set computer (URISC), is an abstract machine that uses only one instructionobviating the need for a machine language opcode. With a judicious choice for the si ...
.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
's MIX architecture also used self-modifying code to implement subroutine calls.


Usage

Self-modifying code can be used for various purposes: * Semi-automatic
optimizing Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
of a state-dependent loop. * Dynamic in-place code optimization for speed depending on load environment. * Run-time code generation, or specialization of an algorithm in runtime or loadtime (which is popular, for example, in the domain of real-time graphics) such as a general sort utility – preparing code to perform the key comparison described in a specific invocation. * Altering of inlined state of an
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
, or simulating the high-level construction of closures. * Patching of subroutine ( pointer) address calling, usually as performed at load/initialization time of dynamic libraries, or else on each invocation, patching the subroutine's internal references to its parameters so as to use their actual addresses (i.e. indirect self-modification). * Evolutionary computing systems such as neuroevolution,
genetic programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
and other
evolutionary algorithm In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduct ...
s. * Hiding of code to prevent reverse engineering (by use of a
disassembler A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
or
debugger A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its executi ...
) or to evade detection by virus/spyware scanning software and the like. * Filling 100% of memory (in some architectures) with a rolling pattern of repeating
opcodes In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
, to erase all programs and data, or to
burn-in Burn-in is the process by which components of a system are exercised before being placed in service (and often, before the system being completely assembled from those components). This testing process will force certain failures to occur under ...
hardware or perform RAM tests. * Compressing code to be decompressed and executed at runtime, e.g., when memory or disk space is limited. * Some very limited instruction sets leave no option but to use self-modifying code to perform certain functions. For example, a
one instruction set computer A one-instruction set computer (OISC), sometimes called an ultimate reduced instruction set computer (URISC), is an abstract machine that uses only one instructionobviating the need for a machine language opcode. With a judicious choice for the si ...
(OISC) machine that uses only the subtract-and-branch-if-negative "instruction" cannot do an indirect copy (something like the equivalent of "*a = **b" in the
C language C (''pronounced like the letter c'') is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities o ...
) without using self-modifying code. * Booting. Early microcomputers often used self-modifying code in their bootloaders. Since the bootloader was keyed in via the front panel at every power-on, it did not matter if the
bootloader A bootloader, also spelled as boot loader or called boot manager and bootstrap loader, is a computer program that is responsible for booting a computer. When a computer is turned off, its softwareincluding operating systems, application code, an ...
modified itself. However, even today many bootstrap loaders are self-relocating, and a few are even self-modifying. * Altering instructions for fault-tolerance.


Optimizing a state-dependent loop

Pseudocode example: repeat ''N'' times Self-modifying code, in this case, would simply be a matter of rewriting the loop like this: repeat ''N'' times Note that two-state replacement of the opcode can be easily written as 'xor var at address with the value "opcodeOf(Inc) xor opcodeOf(dec)"'. Choosing this solution must depend on the value of and the frequency of state changing.


Specialization

Suppose a set of statistics such as average, extrema, location of extrema, standard deviation, etc. are to be calculated for some large data set. In a general situation, there may be an option of associating weights with the data, so each xi is associated with a wi and rather than test for the presence of weights at every index value, there could be two versions of the calculation, one for use with weights and one not, with one test at the start. Now consider a further option, that each value may have associated with it a boolean to signify whether that value is to be skipped or not. This could be handled by producing four batches of code, one for each permutation and code bloat results. Alternatively, the weight and the skip arrays could be merged into a temporary array (with zero weights for values to be skipped), at the cost of processing and still there is bloat. However, with code modification, to the template for calculating the statistics could be added as appropriate the code for skipping unwanted values, and for applying weights. There would be no repeated testing of the options and the data array would be accessed once, as also would the weight and skip arrays, if involved.


Use as camouflage

Self-modifying code is more complex to analyze than standard code and can therefore be used as a protection against reverse engineering and
software cracking Software cracking (known as "breaking" mostly in the 1980s) is the modification of software to remove or disable features which are considered undesirable by the person cracking the software (software cracker), especially copy protection featur ...
. Self-modifying code was used to hide copy protection instructions in 1980s disk-based programs for platforms such as IBM PC and Apple II. For example, on an IBM PC (or compatible), the floppy disk drive access instruction int 0x13 would not appear in the executable program's image but it would be written into the executable's memory image after the program started executing. Self-modifying code is also sometimes used by programs that do not want to reveal their presence, such as computer viruses and some
shellcode In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised m ...
s. Viruses and shellcodes that use self-modifying code mostly do this in combination with
polymorphic code In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the ''code'' changes itself every time it runs, but the ''function'' of the code (its semantics) will not chang ...
. Modifying a piece of running code is also used in certain attacks, such as
buffer overflow In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations. Buffers are areas of memo ...
s.


Self-referential machine learning systems

Traditional
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
systems have a fixed, pre-programmed learning
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
to adjust their
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s. However, since the 1980s
Jürgen Schmidhuber Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artifi ...
has published several self-modifying systems with the ability to change their own learning algorithm. They avoid the danger of catastrophic self-rewrites by making sure that self-modifications will survive only if they are useful according to a user-given fitness,
error An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
or reward function.


Operating systems

The Linux kernel notably makes wide use of self-modifying code; it does so to be able to distribute a single binary image for each major architecture (e.g.
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation o ...
,
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
, 32-bit
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
,
ARM64 AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
...) while adapting the kernel code in memory during boot depending on the specific CPU model detected, e.g. to be able to take advantage of new CPU instructions or to work around hardware bugs. Regardless, at a
meta-level Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending". In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or end ...
, programs can still modify their own behavior by changing data stored elsewhere (see
metaprogramming Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself ...
) or via use of polymorphism.


Massalin's Synthesis kernel

The Synthesis
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
presented in
Alexia Massalin Alexia Massalin (formerly Henry Massalin) is an American computer scientist and programmer. She pioneered the concept of superoptimization, and designed the ''Synthesis kernel'', a small kernel with a Unix compatibility layer that makes heavy ...
's
Ph.D. A Doctor of Philosophy (PhD, Ph.D., or DPhil; Latin: or ') is the most common degree at the highest academic level awarded following a course of study. PhDs are awarded for programs across the whole breadth of academic fields. Because it is ...
thesis is a tiny
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
kernel that takes a
structured Structuring, also known as smurfing in banking jargon, is the practice of executing financial transactions such as making bank deposits in a specific pattern, calculated to avoid triggering financial institutions to file reports required by law ...
, or even
object oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of p ...
, approach to self-modifying code, where code is created for individual
quaject In computer science, a quaject is an object-like data structure containing both data and code (or pointers to code), exposed as an interface in the form of '' callentries'', and can accept a list of callentries to other quajects for ''callbacks'' a ...
s, like filehandles. Generating code for specific tasks allows the Synthesis kernel to (as a JIT interpreter might) apply a number of
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
s such as constant folding or
common subexpression elimination In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a si ...
. The Synthesis kernel was very fast, but was written entirely in assembly. The resulting lack of portability has prevented Massalin's optimization ideas from being adopted by any production kernel. However, the structure of the techniques suggests that they could be captured by a higher level
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
, albeit one more complex than existing mid-level languages. Such a language and compiler could allow development of faster operating systems and applications.
Paul Haeberli Paul E. Haeberli is an American computer graphics programmer and researcher. Biography Paul Haeberli studied for a Bachelor of Science degree in electrical engineering at the University of Wisconsin–Madison, United States. Haeberli was recruite ...
and Bruce Karsh have objected to the "marginalization" of self-modifying code, and optimization in general, in favor of reduced development costs.


Interaction of cache and self-modifying code

On architectures without coupled data and instruction cache (for example, some
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
, ARM, and MIPS cores) the cache synchronization must be explicitly performed by the modifying code (flush data cache and invalidate instruction cache for the modified memory area). In some cases short sections of self-modifying code execute more slowly on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay, if the modified codelet shares the same cache line with the modifying code, as is the case when the modified memory address is located within a few bytes to the one of the modifying code. The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely, such as in the case of a state switching inside an inner loop. Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the
instruction pointer The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
is modified, the processor will not notice, but instead execute the code as it was ''before'' it was modified. See
prefetch input queue Fetching the instruction opcodes from program memory well in advance is known as prefetching and it is served by using prefetch input queue (PIQ).The pre-fetched instructions are stored in data structure - namely a queue. The fetching of opcodes ...
(PIQ). PC processors must handle self-modifying code correctly for backwards compatibility reasons but they are far from efficient at doing so.


Security issues

Because of the security implications of self-modifying code, all of the major
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
s are careful to remove such vulnerabilities as they become known. The concern is typically not that programs will intentionally modify themselves, but that they could be maliciously changed by an
exploit Exploit means to take advantage of something (a person, situation, etc.) for one's own end, especially unethically or unjustifiably. Exploit can mean: *Exploitation of natural resources *Exploit (computer security) * Video game exploit *Exploitat ...
. One mechanism for preventing malicious code modification is an operating system feature called
W^X W^X ("write xor execute", pronounced ''W xor X'') is a security feature in operating systems and virtual machines. It is a memory protection policy whereby every page in a process's or kernel's address space may be either writable or executable ...
(for "write xor execute"). This mechanism prohibits a program from making any page of memory both writable and executable. Some systems prevent a writable page from ever being changed to be executable, even if write permission is removed. Other systems provide a ' back door' of sorts, allowing multiple mappings of a page of memory to have different permissions. A relatively portable way to bypass W^X is to create a file with all permissions, then map the file into memory twice. On Linux, one may use an undocumented SysV shared memory flag to get executable shared memory without needing to create a file.


Advantages

*
Fast path Fast path is a term used in computer science to describe a path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently th ...
s can be established for a program's execution, reducing some otherwise repetitive conditional branches. * Self-modifying code can improve
algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
.


Disadvantages

Self-modifying code is harder to read and maintain because the instructions in the source program listing are not necessarily the instructions that will be executed. Self-modification that consists of substitution of
function pointer A function pointer, also called a subroutine pointer or procedure pointer, is a pointer that points to a function. As opposed to referencing a data value, a function pointer points to executable code within memory. Dereferencing the function poi ...
s might not be as cryptic, if it is clear that the names of functions to be called are placeholders for functions to be identified later. Self-modifying code can be rewritten as code that tests a flag and branches to alternative sequences based on the outcome of the test, but self-modifying code typically runs faster. Self-modifying code conflicts with authentication of the code and may require exceptions to policies requiring that all code running on a system be signed. Modified code must be stored separately from its original form, conflicting with memory management solutions that normally discard the code in RAM and reload it from the executable file as needed. On modern processors with an
instruction pipeline In computer engineering, instruction pipelining or ILP is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing inco ...
, code that modifies itself frequently may run more slowly, if it modifies instructions that the processor has already read from memory into the pipeline. On some such processors, the only way to ensure that the modified instructions are executed correctly is to flush the pipeline and reread many instructions. Self-modifying code cannot be used at all in some environments, such as the following: * Application software running under an operating system with strict W^X security cannot execute instructions in pages it is allowed to write to—only the operating system is allowed to both write instructions to memory and later execute those instructions. * Many
Harvard architecture The Harvard architecture is a computer architecture with separate storage and signal pathways for instructions and data. It contrasts with the von Neumann architecture, where program instructions and data share the same memory and pathways. ...
microcontrollers cannot execute instructions in read-write memory, but only instructions in memory that it cannot write to, ROM or non-self-programmable flash memory. * A multithreaded application may have several threads executing the same section of self-modifying code, possibly resulting in computation errors and application failures.


See also

*
Overlapping code In computer programming, machine code is any low-level programming language, consisting of machine language instruction set architecture, instructions, which are used to control a computer's central processing unit (CPU). Each instruction cau ...
*
Polymorphic code In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the ''code'' changes itself every time it runs, but the ''function'' of the code (its semantics) will not chang ...
* Polymorphic engine *
Persistent data structure In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (v ...
* AARD code *
Algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
*
eval In some programming languages, eval , short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had ...
statement *
IBM 1130 The IBM 1130 Computing System, introduced in 1965, was IBM's least expensive computer at that time. A binary 16-bit machine, it was marketed to price-sensitive, computing-intensive technical markets, like education and engineering, succeeding th ...
(Example) *
Just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may co ...
: This technique can often give users many of the benefits of self-modifying code (except memory size) without the disadvantages. *
Dynamic dead code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
*
Homoiconicity In computer programming, homoiconicity (from the Greek words ''homo-'' meaning "the same" and ''icon'' meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as ...
*
PCASTL The PCASTL (an acronym for ''by Parent and Childset Accessible Syntax Tree Language'') is an interpreted high-level programming language. It was created in 2008 by Philippe Choquette. The PCASTL is designed to ease the writing of self-modifying c ...
*
Quine (computing) A quine is a computer program which takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", " ...
*
Self-replication Self-replication is any behavior of a dynamical system that yields construction of an identical or similar copy of itself. Biological cells, given suitable environments, reproduce by cell division. During cell division, DNA is replicated and c ...
*
Reflection (computer science) In computer science, reflective programming or reflection is the ability of a process to examine, introspect, and modify its own structure and behavior. Historical background The earliest computers were programmed in their native assembly la ...
*
Monkey patch Monkey patching is a technique used to dynamically update the behavior of a piece of code at run-time. A monkey patch (also spelled monkey-patch, MonkeyPatch) is a way to extend or modify the runtime code of dynamic languages (e.g. Smalltalk, JavaSc ...
: a modification to runtime code that does not affect a program's original source code *
Extensible programming Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this ...
: a programming paradigm in which a programming language can modify its own syntax *
Self-modifying computer virus A computer virus is a type of computer program that, when executed, replicates itself by modifying other computer programs and inserting its own code. If this replication succeeds, the affected areas are then said to be "infected" with a comput ...
* Self-hosting * Compiler bootstrapping *
Patchable microcode A control store is the part of a Central processing unit, CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. A control store implementation whose contents are unalterable is known as a Read-only memor ...


Notes


References


Further reading

*


External links


Using self-modifying code under Linux




{{DEFAULTSORT:Self-Modifying Code Programming paradigms