In
hacking, a shellcode is a small piece of code used as the
payload
Payload is the object or the entity which is being carried by an aircraft or launch vehicle. Sometimes payload also refers to the carrying capacity of an aircraft or launch vehicle, usually measured in terms of weight. Depending on the nature of ...
in the
exploitation
Exploitation may refer to:
*Exploitation of natural resources
*Exploitation of labour
** Forced labour
*Exploitation colonialism
*Slavery
** Sexual slavery and other forms
*Oppression
*Psychological manipulation
In arts and entertainment
*Exploi ...
of a software
vulnerability
Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally."
A window of vulnerability (WOV) is a time frame within which defensive measures are diminished, com ...
. It is called "shellcode" because it typically starts a
command shell
In computing, a shell is a computer program that exposes an operating system's services to a human user or other programs. In general, operating system shells use either a command-line interface (CLI) or graphical user interface (GUI), depending ...
from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
.
When creating shellcode, it is generally desirable to make it both small and executable, which allows it to be used in as wide a variety of situations as possible.
Writing good shellcode can be as much an art as it is a science. In assembly code, the same function can be performed in a multitude of ways and there is some variety in the lengths of opcodes that can be used for this purpose; good shellcode writers can put these small opcodes to use to create more compact shellcode. Some have reached the smallest possible size while maintaining stability.
Types of shellcode
Shellcode can either be ''local'' or ''remote'', depending on whether it gives an attacker control over the machine it runs on (local) or over another machine through a network (remote).
Local
''Local'' shellcode is used by an attacker who has limited access to a machine but can exploit a vulnerability, for example a
buffer overflow
In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations.
Buffers are areas of memory ...
, in a higher-privileged process on that machine. If successfully executed, the shellcode will provide the attacker access to the machine with the same higher privileges as the targeted process.
Remote
''Remote'' shellcode is used when an attacker wants to target a vulnerable process running on another machine on a
local network
A local area network (LAN) is a computer network that interconnects computers within a limited area such as a residence, school, laboratory, university campus or office building. By contrast, a wide area network (WAN) not only covers a larger ...
,
intranet
An intranet is a computer network for sharing information, easier communication, collaboration tools, operational systems, and other computing services within an organization, usually to the exclusion of access by outsiders. The term is used in c ...
, or a
remote network. If successfully executed, the shellcode can provide the attacker access to the target machine across the network. Remote shellcodes normally use standard
TCP/IP
The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
socket
Socket may refer to:
Mechanics
* Socket wrench, a type of wrench that uses separate, removable sockets to fit different sizes of nuts and bolts
* Socket head screw, a screw (or bolt) with a cylindrical head containing a socket into which the hexag ...
connections to allow the attacker access to the shell on the target machine. Such shellcode can be categorized based on how this connection is set up: if the shellcode establishes the connection, it is called a "reverse shell" or a ''connect-back'' shellcode because the shellcode ''connects back'' to the attacker's machine. On the other hand, if the attacker establishes the connection, the shellcode is called a ''bindshell'' because the shellcode ''binds'' to a certain port on the victim's machine. There's a peculiar shellcode named ''bindshell random port'' that skips the binding part and listens on a random port made available by the
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
. Because of that the
bindshell random port' became the smallest and stable bindshell shellcode for
x86_64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mo ...
available to this date. A third, much less common type, is ''socket-reuse'' shellcode. This type of shellcode is sometimes used when an exploit establishes a connection to the vulnerable process that is not closed before the shellcode is run. The shellcode can then ''re-use'' this connection to communicate with the attacker. Socket re-using shellcode is more elaborate, since the shellcode needs to find out which connection to re-use and the machine may have many connections open.
A
firewall
Firewall may refer to:
* Firewall (computing), a technological barrier designed to prevent unauthorized or unwanted communications between computer networks or hosts
* Firewall (construction), a barrier inside a building, designed to limit the spr ...
can be used to detect outgoing connections made by connect-back shellcode as well as incoming connections made by bindshells. They can therefore offer some protection against an attacker, even if the system is vulnerable, by preventing the attacker from connecting to the shell created by the shellcode. This is one reason why socket re-using shellcode is sometimes used: it does not create new connections and therefore is harder to detect and block.
Download and execute
''Download and execute'' is a type of remote shellcode that ''
downloads
In computer networks, download means to ''receive'' data from a remote system, typically a server such as a web server, an FTP server, an email server, or other similar system. This contrasts with uploading, where data is ''sent to'' a remote ...
and
executes
Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to conclude that t ...
'' some form of malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file off the network, save it to disk and execute it. Nowadays, it is commonly used in
drive-by download
Drive-by download is of two types, each concerning the unintended download of computer software from the Internet:
# Authorized drive-by downloads are downloads which a person has authorized but without understanding the consequences (e.g. down ...
attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine. A variation of this type of shellcode downloads and
loads a
library
A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
. Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.
Staged
When the amount of data that an attacker can inject into the target process is too limited to execute useful shellcode directly, it may be possible to execute it in stages. First, a small piece of shellcode (stage 1) is executed. This code then downloads a larger piece of shellcode (stage 2) into the process's memory and executes it.
Egg-hunt
This is another form of ''staged'' shellcode, which is used if an attacker can inject a larger shellcode into the process but cannot determine where in the process it will end up. Small ''egg-hunt'' shellcode is injected into the process at a predictable location and executed. This code then searches the process's address space for the larger shellcode (the ''egg'') and executes it.
Omelette
This type of shellcode is similar to ''egg-hunt'' shellcode, but looks for multiple small blocks of data (''eggs'') and recombines them into one larger block (the ''omelette'') that is subsequently executed. This is used when an attacker can only inject a number of small blocks of data into the process.
Shellcode execution strategy
An exploit will commonly inject a shellcode into the target process before or at the same time as it exploits a vulnerability to gain control over the
program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
. The program counter is adjusted to point to the shellcode, after which it gets executed and performs its task. Injecting the shellcode is often done by storing the shellcode in data sent over the network to the vulnerable process, by supplying it in a file that is read by the vulnerable process or through the command line or environment in the case of local exploits.
Shellcode encoding
Because most processes filter or restrict the data that can be injected, shellcode often needs to be written to allow for these restrictions. This includes making the code small, null-free or
alphanumeric
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric c ...
. Various solutions have been found to get around such restrictions, including:
* Design and implementation optimizations to decrease the size of the shellcode.
* Implementation modifications to get around limitations in the range of bytes used in the shellcode.
*
Self-modifying code
In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, ...
that modifies a number of the bytes of its own code before executing them to re-create bytes that are normally impossible to inject into the process.
Since
intrusion detection
An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...
can detect signatures of simple shellcodes being sent over the network, it is often encoded, made self-decrypting or
polymorphic to avoid detection.
Percent encoding
Exploits that target browsers commonly encode shellcode in a JavaScript string using
percent-encoding
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as ''URL encoding'', it is also used ...
, escape sequence encoding "" or
entity encoding. Some exploits also obfuscate the encoded shellcode string further to prevent detection by
IDS
IDS may refer to:
Computing
* IBM Informix Dynamic Server, a relational database management system
* Ideographic Description Sequence, describing a Unihan character as a combination of other characters
* Integrated Data Store, one of the first da ...
.
For example, on the
IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
architecture, here's how two
NOP
(no-operation) instructions would look, first unencoded:
90 NOP
90 NOP
This instruction is used in
NOP slide
In computer security, a NOP slide, NOP sled or NOP ramp is a sequence of NOP (no-operation) instructions meant to "slide" the CPU's instruction execution flow to its final, desired destination whenever the program branches to a memory address a ...
s.
Null-free shellcode
Most shellcodes are written without the use of
null
Null may refer to:
Science, technology, and mathematics Computing
* Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value
* Null character, the zero-valued ASCII character, also designated by , often use ...
bytes because they are intended to be injected into a target process through
null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C str ...
s. When a null-terminated string is copied, it will be copied up to and including the first null but subsequent bytes of the shellcode will not be processed. When shellcode that contains nulls is injected in this way, only part of the shellcode would be injected, making it incapable of running successfully.
To produce null-free shellcode from shellcode that contains
null
Null may refer to:
Science, technology, and mathematics Computing
* Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value
* Null character, the zero-valued ASCII character, also designated by , often use ...
bytes, one can substitute machine instructions that contain zeroes with instructions that have the same effect but are free of nulls. For example, on the
IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
architecture one could replace this instruction:
B8 01000000
MOV EAX,1 // Set the register EAX to 0x000000001
which contains zeroes as part of the literal (
1
expands to
0x00000001
) with these instructions:
33C0
XOR
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
EAX,EAX // Set the register EAX to 0x000000000
40
INC
Inc. or inc may refer to:
* Incorporation (business), as a suffix indicating a corporation
* ''Inc.'' (magazine), an American business magazine
* Inc. No World, a Los Angeles-based band
* Indian National Congress, a political party in India
* I ...
EAX // Increase EAX to 0x00000001
which have the same effect but take fewer bytes to encode and are free of nulls.
Alphanumeric and printable shellcode
An alphanumeric shellcode is a shellcode that consists of or assembles itself on execution into entirely
alphanumeric
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric c ...
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
or
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
characters such as 0-9, A-Z and a-z.
This type of encoding was created by
hacker
A hacker is a person skilled in information technology who uses their technical knowledge to achieve a goal or overcome an obstacle, within a computerized system by non-standard means. Though the term ''hacker'' has become associated in popu ...
s to hide working
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
inside what appears to be text. This can be useful to avoid detection of the code and to allow the code to pass through filters that scrub non-alphanumeric characters from strings (in part, such filters were a response to non-alphanumeric shellcode exploits). A similar type of encoding is called ''printable code'' and uses all
printable characters (0-9, A-Z, a-z, !@#%^&*() etc.). An similarly restricted variant is ''ECHOable code'' not containing any characters which are not accepted by the
ECHO
In audio signal processing and acoustics, an echo is a reflection of sound that arrives at the listener with a delay after the direct sound. The delay is directly proportional to the distance of the reflecting surface from the source and the lis ...
command. It has been shown that it is possible to create shellcode that looks like normal text in English.
[ (10 pages)]
Writing alphanumeric or printable code requires good understanding of the
instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
of the machine(s) on which the code is to be executed. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine,
thereby constituting
multi-architecture executable code.
In certain circumstances, a target process will filter any byte from the injected shellcode that is not a
printable or
alphanumeric
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric c ...
character. Under such circumstances, the range of instructions that can be used to write a shellcode becomes very limited. A solution to this problem was published by Rix in
Phrack
''Phrack'' is an e-zine written by and for hackers, first published November 17, 1985. Described by Fyodor as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkabl ...
57
in which he showed it was possible to turn any code into alphanumeric code. A technique often used is to create self-modifying code, because this allows the code to modify its own bytes to include bytes outside of the normally allowed range, thereby expanding the range of instructions it can use. Using this trick, a self-modifying decoder can be created that initially uses only bytes in the allowed range. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder can modify its own code to be able to use any instruction it requires to function properly and then continues to decode the original shellcode. After decoding the shellcode the decoder transfers control to it, so it can be executed as normal. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal text in English.
Unicode proof shellcode
Modern programs use
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
strings to allow internationalization of text. Often, these programs will convert incoming
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
strings to Unicode before processing them. Unicode strings encoded in
UTF-16
UTF-16 (16-bit computing, 16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variab ...
use two bytes to encode each character (or four bytes for some special characters). When an
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
(
Latin-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
in general) string is transformed into UTF-16, a zero byte is inserted after each byte in the original string. Obscou proved in
Phrack
''Phrack'' is an e-zine written by and for hackers, first published November 17, 1985. Described by Fyodor as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkabl ...
61
that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.
Platforms
Most shellcode is written in
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
because of the low level at which the vulnerability being exploited gives an attacker access to the process. Shellcode is therefore often created to target one specific combination of
processor
Processor may refer to:
Computing Hardware
* Processor (computing)
**Central processing unit (CPU), the hardware within a computer that executes a program
*** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
,
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
and
service pack, called a
platform
Platform may refer to:
Technology
* Computing platform, a framework on which applications may be run
* Platform game, a genre of video games
* Car platform, a set of components shared by several vehicle models
* Weapons platform, a system or ...
. For some exploits, due to the constraints put on the shellcode by the target process, a very specific shellcode must be created. However, it is not impossible for one shellcode to work for multiple exploits, service packs, operating systems and even processors.
(12 pages) (See also
Such versatility is commonly achieved by creating multiple versions of the shellcode that target the various platforms and creating a header that branches to the correct version for the platform the code is running on. When executed, the code behaves differently for different platforms and executes the right part of the shellcode for the platform it is running on.
Shellcode analysis
Shellcode cannot be executed directly. In order to analyze what a shellcode attempts to do it must be loaded into another process. One common analysis technique is to write a small C program which holds the shellcode as a byte buffer, and then use a function pointer or use inline assembler to transfer execution to it. Another technique is to use an online tool, such as shellcode_2_exe, to embed the shellcode into a pre-made executable husk which can then be analyzed in a standard debugger. Specialized shellcode analysis tools also exist, such as the iDefense sclog project which was originally released in 2005 as part of the Malcode Analyst Pack. Sclog is designed to load external shellcode files and execute them within an API logging framework. Emulation based shellcode analysis tools also exist such as the sctest application which is part of the cross platform libemu package. Another emulation based shellcode analysis tool, built around the libemu library, is scdbg which includes a basic debug shell and integrated reporting features.
See also
*
Alphanumeric code
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric c ...
*
Computer security
Computer security, cybersecurity (cyber security), or information technology security (IT security) is the protection of computer systems and networks from attack by malicious actors that may result in unauthorized information disclosure, the ...
*
Buffer overflow
In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations.
Buffers are areas of memory ...
*
Exploit (computer security)
An exploit (from the English verb ''to exploit'', meaning "to use something to one’s own advantage") is a piece of software, a chunk of data, or a sequence of commands that takes advantage of a bug or vulnerability to cause unintended or unanti ...
*
Heap overflow
A heap overflow, heap overrun, or heap smashing is a type of buffer overflow that occurs in the heap data area. Heap overflows are exploitable in a different manner to that of stack-based overflows. Memory on the heap is dynamically allocated at ...
*
Metasploit Project
The Metasploit Project is a computer security project that provides information about security vulnerabilities and aids in penetration testing and IDS signature development. It is owned by Boston, Massachusetts-based security company Rapid7.
It ...
*
Shell (computing)
In computing, a shell is a computer program that exposes an operating system's services to a human user or other programs. In general, operating system shells use either a command-line interface (CLI) or graphical user interface (GUI), depending ...
*
Shell shoveling
*
Stack buffer overflow
In software, a stack buffer overflow or stack buffer overrun occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer.
Stack buffer overflow bugs ...
*
Vulnerability (computing)
Vulnerabilities are flaws in a computer system that weaken the overall security of the device/system. Vulnerabilities can be weaknesses in either the hardware itself, or the software that runs on the hardware. Vulnerabilities can be exploited by ...
References
External links
Shell-StormDatabase of shellcodes Multi-Platform.
*
ttp://www.infosecwriters.com/text_resources/pdf/basics_of_shellcoding.pdf The Basics of Shellcoding(PDF) An overview of
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
shellcoding b
Angelo RosielloContains x86 and non-x86 shellcode samples and an online interface for automatic shellcode generation and encoding, from the Metasploit Projecta shellcode archive, sorted by Operating system
Microsoft Windows and Linux shellcode design tutorial going from basic to advanced
*
ttps://web.archive.org/web/20210322094322/http://www.enderunix.org/docs/en/sc-en.txt Designing shellcode demystifiedALPHA3A shellcode encoder that can turn any shellcode into both Unicode and ASCII, uppercase and mixedcase, alphanumeric shellcode.
Writing Small shellcode by Dafydd StuttardA whitepaper explaining how to make shellcode as small as possible by optimizing both the design and implementation.
{{Webarchive, url=https://web.archive.org/web/20150403114315/http://skypher.com/wiki/index.php?title=Www.edup.tudelft.nl%2F~bjwever%2Fwhitepaper_shellcode.html.php , date=2015-04-03 A whitepaper explaining how to create shellcode when the bytes allowed in the shellcode are very restricted.
BETA3A tool that can encode and decode shellcode using a variety of encodings commonly used in exploits.
Shellcode 2 Exe- Online converter to embed shellcode in exe husk
Sclog- Updated build of the iDefense sclog shellcode analysis tool (Windows)
Libemu- emulation based shellcode analysis library (*nix/Cygwin)
Scdbg- shellcode debugger built around libemu emulation library (*nix/Windows)
Injection exploits