Reverse engineering (also known as backwards engineering or back engineering) is a process or method through the application of which one attempts to understand through deductive reasoning
how a device, process, system, or piece of software accomplishes a task with very little (if any) insight into exactly how it does so.
Reverse engineering is applicable in the fields of computer engineering
, mechanical engineering
, electronic engineering
, software engineering
, chemical engineering
, and systems biology
There are many reasons for performing reverse engineering in various fields. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage.
However, the reverse engineering process, as such, is not concerned with creating a copy or changing the artifact in some way. It is only an analysis
design features from products with little or no additional knowledge about the procedures involved in their original production.
In some cases, the goal of the reverse engineering process can simply be a redocumentation
of legacy systems
[A Survey of Reverse Engineering and Program Comprehension. Michael L. Nelson, April 19, 1996, ODU CS 551 – Software Engineering Survey.]
Even when the reverse-engineered product is that of a competitor, the goal may not be to copy it but to perform competitor analysis
Reverse engineering may also be used to create interoperable products
and despite some narrowly-tailored United States and European Union legislation, the legality of using specific reverse engineering techniques for that purpose has been hotly contested in courts worldwide for more than two decades.
reverse engineering can help to improve the understanding of the underlying source code for the maintenance and improvement of the software, relevant information can be extracted to make a decision for software development and graphical representations of the code can provide alternate views regarding the source code, which can help to detect and fix a software bug or vulnerability. Frequently, as some software develops, its design information and improvements are often lost over time, but that lost information can usually be recovered with reverse engineering. The process can also help to cut down the time required to understand the source code, thus reducing the overall cost of the software development.
Reverse engineering can also help to detect and to eliminate a malicious code written to the software with better code detectors. Reversing a source code can be used to find alternate uses of the source code, such as detecting the unauthorized replication of the source code where it was not intended to be used, or revealing how a competitor's product was built.
That process is commonly used for "cracking" software and media
to remove their copy protection
or to create a possibly-improved copy
or even a knockoff
, which is usually the goal of a competitor or a hacker.
developers often use reverse engineering techniques to find vulnerabilities in an operating system
to build a computer virus
that can exploit the system vulnerabilities.
Reverse engineering is also being used in cryptanalysis
to find vulnerabilities in substitution cipher
, symmetric-key algorithm
or public-key cryptography
There are other uses to reverse engineering:
* Interfacing. Reverse engineering can be used when a system is required to interface to another system and how both systems would negotiate is to be established. Such requirements typically exist for interoperability
* Military or commercial espionage
. Learning about an enemy's or competitor's latest research by stealing or capturing a prototype and dismantling it may result in the development of a similar product or a better countermeasure against it.
* Obsolescence. Integrated circuits
are often designed on proprietary systems and built on production lines, which become obsolete in only a few years. When systems using those parts can no longer be maintained since the parts are no longer made, the only way to incorporate the functionality into new technology is to reverse-engineer the existing chip and then to redesign
it using newer tools by using the understanding gained as a guide. Another obsolescence originated problem that can be solved by reverse engineering is the need to support (maintenance and supply for continuous operation) existing legacy devices that are no longer supported by their original equipment manufacturer
. The problem is particularly critical in military operations.
* Product security analysis. That examines how a product works by determining the specifications of its components and estimate costs and identifies potential patent infringement
. Also part of product security analysis is acquiring sensitive data by disassembling and analyzing the design of a system component.
[Internet Engineering Task Force RFC 2828 Internet Security Glossary]
Another intent may be to remove copy protection
or to circumvent access restrictions.
* Competitive technical intelligence. That is to understand what one's competitor is actually doing, rather than what it says that it is doing.
* Saving money. Finding out what a piece of electronics can do may spare a user from purchasing a separate product.
. Obsolete objects are then reused in a different-but-useful manner.
. Production and design companies applied Reverse Engeneering to practical craft-based manufacturing process. The companies can work on “historical” manufacturing collections through 3D scanning, 3D re-modeling and re-design. In 2013 italian manufactures Baldi
and Savio Firmino
togheter with University of Florence
optimized their innovation, design, and production processes.
As computer-aided design
(CAD) has become more popular, reverse engineering has become a viable method to create a 3D virtual model of an existing physical part for use in 3D CAD, CAM
, or other software. The reverse-engineering process involves measuring an object and then reconstructing it as a 3D model. The physical object can be measured using 3D scanning
technologies like CMMs
, laser scanners
, structured light digitizers
, or industrial CT scanning
(computed tomography). The measured data alone, usually represented as a point cloud
, lacks topological information and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh. Reverse engineering aims to go beyond producing such a mesh and to recover the design intent in terms of simple analytical surfaces where appropriate (planes, cylinders, etc.) as well as possibly NURBS
surfaces to produce a boundary-representation CAD model. Recovery of such a model allows a design to be modified to meet new requirements, a manufacturing plan to be generated, etc.
Hybrid modeling is a commonly used term when NURBS and parametric modeling
are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling. Areas of freeform data can be combined with exact geometric surfaces to create a hybrid model. A typical example of this would be the reverse engineering of a cylinder head, which includes freeform cast features, such as water jackets and high-tolerance machined areas.
Reverse engineering is also used by businesses to bring existing physical geometry into digital product development environments, to make a digital 3D record of their own products, or to assess competitors' products. It is used to analyze how a product works, what it does, what components it has; estimate costs; identify potential patent
, a related activity that is also used by businesses, involves deconstructing and analyzing products. However, the objective is to find opportunities for cost-cutting.
In 1990, the Institute of Electrical and Electronics Engineers
(IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a
subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher
level of abstraction" in which the "subject system" is the end product of software development. Reverse engineering is a process of examination only, and the software system under consideration is not modified, which would otherwise be re-engineering
or restructuring. Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product.
There are two components in reverse engineering: redocumentation and design recovery. Redocumentation is the creation of new representation of the computer code so that it is easier to understand. Meanwhile, design recovery is the use of deduction or reasoning from general knowledge or personal experience of the product to understand the product's functionality fully.
It can also be seen as "going backwards through the development cycle." In this model, the output of the implementation phase (in source code form) is reverse-engineered back to the analysis phase, in an inversion of the traditional waterfall model
. Another term for this technique is program comprehension
The Working Conference on Reverse Engineering (WCRE) has been held yearly to explore and expand the techniques of reverse engineering.
Computer-aided software engineering
(CASE) and automated code generation have contributed greatly in the field of reverse engineering.
technology like obfuscation
is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, which are perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering. The second usage of the term is more familiar to most people. Reverse engineering of software can make use of the clean room design
technique to avoid copyright infringement.
On a related note, black box testing
in software engineering
has a lot in common with reverse engineering. The tester usually has the API
but has the goals to find bugs and undocumented features by bashing the product from outside.
Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking
"), circumvention of access restrictions often present in consumer electronics
, customization of embedded systems
(such as engine management systems), in-house repairs or retrofits, enabling of additional features on low-cost "crippled" hardware (such as some graphics card chip-sets), or even mere satisfaction of curiosity.
Binary reverse engineering is performed if source code for a software is unavailable.
This process is sometimes termed ''reverse code engineering'', or RCE. For example, decompilation of binaries for the Java platform
can be accomplished by using Jad. One famous case of reverse engineering was the first non-IBM
implementation of the PC BIOS
, which launched the historic IBM PC compatible
industry that has been the overwhelmingly-dominant computer hardware
platform for many years. Reverse engineering of software is protected in the US by the fair use
exception in copyright law
. The Samba software
, which allows systems that do not run Microsoft Windows
systems to share files with systems that run it, is a classic example of software reverse engineering since the Samba project had to reverse-engineer unpublished information about how Windows file sharing worked so that non-Windows computers could emulate it. The Wine
project does the same thing for the Windows API
, and OpenOffice.org
is one party doing that for the Microsoft Office
file formats. The ReactOS
project is even more ambitious in its goals by striving to provide binary (ABI and API) compatibility with the current Windows operating systems of the NT branch, which allows software and drivers written for Windows to run on a clean-room reverse-engineered free software
) counterpart. WindowsSCOPE
allows for reverse-engineering the full contents of a Windows system's live memory including a binary-level, graphical reverse engineering of all running processes.
Another classic, if not well-known, example is that in 1987 Bell Laboratories
reverse-engineered the Mac OS
System 4.1, originally running on the Apple Macintosh SE
, so that it could run it on RISC machines of their own.
=Binary software techniques
Reverse engineering of software can be accomplished by various methods.
The three main groups of software reverse engineering are
#Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzer
s and packet sniffer
s, such as for accessing a computer bus
or computer network
connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a standalone implementation that mimics that behavior. That is especially useful for reverse engineering device driver
s. Sometimes, reverse engineering on embedded system
s is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG
ports or other debugging means. In Microsoft Windows
, low-level debuggers such as SoftICE
using a disassembler
, meaning the raw machine language
of the program is read and understood in its own terms, only with the aid of machine-language mnemonic
s. It works on any computer program but can take quite some time, especially for those who are not used to machine code. The Interactive Disassembler
is a particularly popular tool.
#Decompilation using a decompiler
, a process that tries, with varying results, to recreate the source code in some high-level language for a program only available in machine code or bytecode
Software classification is the process of identifying similarities between different software binaries (such as two different versions of the same binary) used to detect code relations between software samples. The task was traditionally done manually for several reasons (such as patch analysis for vulnerability detection and copyright infringement
), but it can now be done somewhat automatically for large numbers of samples.
This method is being used mostly for long and thorough reverse engineering tasks (complete analysis of a complex algorithm or big piece of software). In general, statistical classification
is considered to be a hard problem, which is also true for software classification, and so few solutions/tools that handle this task well.
A number of UML
tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering." See List of UML tools
Although UML is one approach to providing "reverse engineering" more recent advances in international standards activities have resulted in the development of the Knowledge Discovery Metamodel
(KDM). The standard delivers an ontology for the intermediate (or abstracted) representation of programming language constructs and their interrelationships. An Object Management Group
standard (on its way to becoming an ISO standard as well), KDM has started to take hold in industry with the development of tools and analysis environments that can deliver the extraction and analysis of source, binary, and byte code. For source code analysis, KDM's granular standards' architecture enables the extraction of software system flows (data, control, and call maps), architectures, and business layer knowledge (rules, terms, and process). The standard enables the use of a common data format (XMI) enabling the correlation of the various layers of system knowledge for either detailed analysis (such as root cause, impact) or derived analysis (such as business process extraction). Although efforts to represent language constructs can be never-ending because of the number of languages, the continuous evolution of software languages, and the development of new languages, the standard does allow for the use of extensions to support the broad language set as well as evolution. KDM is compatible with UML, BPMN, RDF, and other standards enabling migration into other environments and thus leverage system knowledge for efforts such as software system transformation and enterprise business layer analysis.
are sets of rules that describe message formats and how messages are exchanged: the protocol state machine
. Accordingly, the problem of protocol reverse-engineering can be partitioned into two subproblems: message format and state-machine reverse-engineering.
The message formats have traditionally been reverse-engineered by a tedious manual process, which involved analysis of how protocol implementations process messages, but recent research proposed a number of automatic solutions.
[P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda. Prospex: Protocol specification extraction. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, pp. 110–125, Washington, 2009. IEEE Computer Society.]
Typically, the automatic approaches group observe messages into clusters by using various clustering analyses
, or they emulate the protocol implementation tracing the message processing.
There has been less work on reverse-engineering of state-machines of protocols. In general, the protocol state-machines can be learned either through a process of offline learning
, which passively observes communication and attempts to build the most general state-machine accepting all observed sequences of messages, and online learning
, which allows interactive generation of probing sequences of messages and listening to responses to those probing sequences. In general, offline learning of small state-machines is known to be NP-complete
, but online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al.
and an online approach by Cho et al.
Other components of typical protocols, like encryption and hash functions, can be reverse-engineered automatically as well. Typically, the automatic approaches trace the execution of protocol implementations and try to detect buffers in memory holding unencrypted packets.
Integrated circuits/smart cards
Reverse engineering is an invasive and destructive form of analyzing a smart card
. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope
(SEM). That technique can reveal the complete hardware and software part of the smart card. The major problem for the attacker is to bring everything into the right order to find out how everything works. The makers of the card try to hide keys and operations by mixing up memory positions, such as by bus scrambling.
In some cases, it is even possible to attach a probe to measure voltages while the smart card is still operational. The makers of the card employ sensors to detect and prevent that attack. That attack is not very common because it requires both a large investment in effort and special equipment that is generally available only to large chip manufacturers. Furthermore, the payoff from this attack is low since other security techniques are often used such as shadow accounts. It is still uncertain whether attacks against chip-and-PIN cards to replicate encryption data and then to crack PINs would provide a cost-effective attack on multifactor authentication.
Full reverse engineering proceeds in several major steps.
The first step after images have been taken with a SEM is stitching the images together, which is necessary because each layer cannot be captured by a single shot. A SEM needs to sweep across the area of the circuit and take several hundred images to cover the entire layer. Image stitching takes as input several hundred pictures and outputs a single properly-overlapped picture of the complete layer.
Next, the stitched layers need to be aligned because the sample, after etching, cannot be put into the exact same position relative to the SEM each time. Therefore, the stitched versions will not overlap in the correct fashion, as on the real circuit. Usually, three corresponding points are selected, and a transformation applied on the basis of that.
To extract the circuit structure, the aligned, stitched images need to be segmented, which highlights the important circuitry and separates it from the uninteresting background and insulating materials.
Finally, the wires can be traced from one layer to the next, and the netlist of the circuit, which contains all of the circuit's information, can be reconstructed.
Reverse engineering is often used by people to copy other nations' technologies, devices, or information that have been obtained by regular troops in the fields or by intelligence
operations. It was often used during the Second World War
and the Cold War
. Here are well-known examples from the Second World War and later:
* Jerry can
: British and American forces noticed that the Germans had gasoline cans with an excellent design. They reverse-engineered copies of those cans, which cans were popularly known as "Jerry cans."
: The Germans captured an American bazooka
during the Second World War and reverse engineered it to create the larger Panzerschreck.
* Tupolev Tu-4
: In 1944, three American B-29
bombers on missions over Japan
were forced to land in the Soviet Union
. The Soviets, who did not have a similar strategic bomber, decided to copy the B-29. Within three years, they had developed the Tu-4, a nearly-perfect copy.
* SCR-584 radar
: copied by the Soviet Union after the Second World War, it is known for a few modifications - СЦР-584, Бинокль-Д.
rocket: Technical documents for the V-2 and related technologies were captured by the Western Allies at the end of the war. The Americans focused their reverse engineering efforts via Operation Paperclip
, which led to the development of the PGM-11 Redstone
rocket. The Soviets used captured German engineers to reproduce technical documents and plans and worked from captured hardware to make their clone of the rocket, the R-1
. Thus began the postwar Soviet rocket program, which led to the R-7
and the beginning of the space race
missile (NATO reporting name
AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder
, was made possible after a Taiwanese AIM-9B hit a Chinese MiG-17
without exploding in September 1958. The missile became lodged within the airframe, and the pilot returned to base with what Soviet scientists would describe as a university course in missile development.
* BGM-71 TOW
missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution
ending all plans for such co-production. Iran was later successful in reverse-engineering the missile and now produces its own copy, the Toophan
* China has reversed engineered many examples
of Western and Russian hardware, from fighter aircraft to missiles and HMMWV
cars, such as the MiG-15 (which became the J-7) and the Su-33 (which became the J-15). More recent analyses of China's military growth have pointed to the inherent limitations of reverse engineering for advanced weapon systems.
* During the Second World War, Polish and British cryptographers studied captured German ""Enigma
" message encryption machines for weaknesses. Their operation was then simulated on electromechanical devices, "bombe
s, which tried all the possible scrambler settings of the "Enigma" machines that helped the breaking of coded messages that had been sent by the Germans.
* Also during the Second World War, British scientists analyzed and defeated a series of increasingly-sophisticated radio navigation systems
used by the Luftwaffe
to perform guided bombing missions at night. The British countermeasures to the system were so effective that in some cases, German aircraft were led by signals to land at RAF
bases since they believed that they had returned to German territory.
Reverse engineering concepts have been applied to biology
as well, specifically to the task of understanding the structure and function of gene regulatory network
s. They regulate almost every aspect of biological behavior and allow cells to carry out physiological processes and responses to perturbations. Understanding the structure and the dynamic behavior of gene networks is therefore one of the paramount challenges of systems biology, with immediate practical repercussions in several applications that are beyond basic research.
There are several methods for reverse engineering gene regulatory networks by using molecular biology and data science methods. They have been generally divided into six classes:
* Coexpression methods are based on the notion that if two genes exhibit a similar expression profile, they may be related although no causation can be simply inferred from coexpression.
* Sequence motif methods analyze gene promoters to find specific transcription factor binding domains. If a transcription factor is predicted to bind a promoter of a specific gene, a regulatory connection can be hypothesized.
* Chromatin ImmunoPrecipitation
(ChIP) methods investigate the genome-wide profile of DNA binding of chosen transcription factors to infer their downstream gene networks.
* Orthology methods transfer gene network knowledge from one species to another.
* Literature methods implement text mining
and manual research to identify putative or experimentally-proven gene network connections.
* Transcriptional complexes methods leverage information on protein-protein interactions between transcription factors, thus extending the concept of gene networks to include transcriptional regulatory complexes.
Often, gene network reliability is tested by genetic perturbation experiments followed by dynamic modelling, based on the principle that removing one network node has predictable effects on the functioning of the remaining nodes of the network.
Applications of the reverse engineering of gene networks range from understanding mechanisms of plant physiology
to the highlighting of new targets for anticancer therapy.
Overlap with patent law
Reverse engineering applies primarily to gaining understanding of a process or artifact in which the manner of its construction, use, or internal processes has not been made clear by its creator.
ed items do not of themselves have to be reverse-engineered to be studied, for the essence of a patent is that inventors provide a detailed public disclosure themselves, and in return receive legal protection
of the invention
that is involved. However, an item produced under one or more patents could also include other technology that is not patented and not disclosed. Indeed, one common motivation of reverse engineering is to determine whether a competitor's product contains patent infringement
or copyright infringement
In the United States, even if an artifact or process is protected by trade secrets
, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained.
Reverse engineering of computer software
often falls under both contract law
as a breach of contract
as well as any other relevant laws. That is because most end user license agreements
specifically prohibit it, and US courts have ruled that if such terms are present, they override the copyright law that expressly permits it (see ''Bowers v. Baystate Technologies
''). According to Section 103(f) of the Digital Millennium Copyright Act17 U.S.C. § 1201 (f)
, a person in legal possession of a program may reverse-engineer and circumvent its protection if that is necessary to achieve "interoperability," a term that broadly covers other devices and programs that can interact with it, make use of it, and to use and transfer data to and from it in useful ways. A limited exemption exists that allows the knowledge thus gained to be shared and used for interoperability purposes.
EU Directive 2009/24
on the legal protection of computer programs, which superseded an earlier (1991) directive, governs reverse engineering in the European Union
[The directive states:
* Antikythera mechanism
* Bus analyzer
* Clone (computing)
* Clean room design
* Code morphing
* Connectix Virtual Game Station
* Digital Millennium Copyright Act
* Forensic engineering
* Industrial CT scanning
* Interactive Disassembler
* Knowledge Discovery Metamodel
* Laser scanner
* List of production topics
* Listeroid Engines
* Logic analyzer
* Product teardown
* ''Sega v. Accolade
* Software archaeology
* Software cracking
* Structured light digitizer
* Value engineering
* Elvidge, Julia, "Using Reverse Engineering to Discover Patent Infringement," Chipworks, Sept. 2010. Online: http://www.photonics.com/Article.aspx?AID=44063
* Hausi A. Müller
and Holger M. Kienle, "A Small Primer on Software Reverse Engineering," Technical Report, University of Victoria, 17 pages, March 2009. Online: http://holgerkienle.wikispaces.com/file/view/MK-UVic-09.pdf
* Heines, Henry, "Determining Infringement by X-Ray Diffraction," ''Chemical Engineering Process'', Jan. 1999 (example of reverse engineering used to detect IP infringement)
* (introduction to hardware teardowns, including methodology, goals)
* Reverse Engineering for Beginners
* Samuelson, Pamela
and Scotchmer, Suzanne, "The Law and Economics of Reverse Engineering," 111 Yale L.J. 1575 (2002). Online: http://people.ischool.berkeley.edu/~pam/papers/l&e%20reveng3.pdf
* (xviii+856+vi pages, 3.5"-floppy) Errata
https://web.archive.org/web/20190417212906/https://www.pcjs.org/pubs/pc/programming/Undocumented_DOS/#errata-2nd-edition] (NB. On general methodology of reverse engineering, applied to mass-market software: a program for exploring DOS, disassembling DOS.)
* (pp. 59–188 on general methodology of reverse engineering, applied to mass-market software: examining Windows executables, disassembling Windows, tools for exploring Windows)
* Schulman, Andrew, "Hiding in Plain Sight: Using Reverse Engineering to Uncover Software Patent Infringement," ''Intellectual Property Today'', Nov. 2010. Online: http://www.iptoday.com/issues/2010/11/hiding-in-plain-sight-using-reverse-engineering-to-uncover-software-patent-infringement.asp
* Schulman, Andrew, "Open to Inspection: Using Reverse Engineering to Uncover Software Prior Art," ''New Matter'' (Calif. State Bar IP Section), Summer 2011 (Part 1); Fall 2011 (Part 2). Online: http://www.SoftwareLitigationConsulting.com