Code Stylometry
   HOME
*





Code Stylometry
Code stylometry (also known as program authorship attribution or source code authorship analysis) is the application of stylometry to computer code to attribute authorship to anonymous binary or source code. It often involves breaking down and examining the distinctive patterns and characteristics of the programming code and then comparing them to computer code whose authorship is known. Unlike software forensics, code stylometry attributes authorship for purposes other than intellectual property infringement, including plagiarism detection, copyright investigation, and authorship verification. History In 1989, researchers Paul Oman and Curtis Cook identified the authorship of 18 different Pascal programs written by six authors by using “markers” based on typographic characteristics. In 1998, researchers Stephen MacDonell, Andrew Gray, and Philip Sallis developed a dictionary-based author attribution system called IDENTIFIED (Integrated Dictionary-based Extraction of Non-lang ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Stylometry
Stylometry is the application of Stylistics (linguistics), the study of linguistic style, usually to written language. It has also been applied successfully to music and to fine-art paintings as well.Shlomo Argamon, Argamon, Shlomo, Kevin Burns, and Shlomo Dubnov, eds. The structure of style: algorithmic approaches to understanding manner and meaning. Springer Science & Business Media, 2010. Another conceptualization defines it as the linguistic discipline that evaluates an author's style through the application of statistical analysis to a body of their work. Stylometry is often used to attribute authorship to Anonymous work, anonymous or disputed documents. It has legal as well as academic and literary applications, ranging from the question of the Shakespeare attribution studies, authorship of Shakespeare's works to forensic linguistics and has methodological similarities with the analysis of text readability. History Stylometry grew out of earlier techniques of analyzing text ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Drexel University
Drexel University is a private research university with its main campus in Philadelphia, Pennsylvania. Drexel's undergraduate school was founded in 1891 by Anthony J. Drexel, a financier and philanthropist. Founded as Drexel Institute of Art, Science and Industry, it was renamed Drexel Institute of Technology in 1936, before assuming its current name in 1970. , more than 24,000 students were enrolled in over 70 undergraduate programs and more than 100 master's, doctoral, and professional programs at the university. Drexel's cooperative education program (co-op) is a prominent aspect of the school's degree programs, offering students the opportunity to gain up to 18 months of paid, full-time work experience in a field relevant to their undergraduate major or graduate degree program prior to graduation. History Drexel University was founded in 1891 as the Drexel Institute of Art, Science and Industry, by Philadelphia financier and philanthropist Anthony J. Drexel. The orig ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Computational Fields Of Study
Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, History of computing hardware, historically, people) that perform computations are known as ''computers''. An especially well-known discipline of the study of computation is computer science. Physical process of Computation Computation can be seen as a purely physical process occurring inside a closed physical system called a computer. Examples of such physical systems are digital computers, mechanical computers, quantum computers, DNA computers, molecular computers, microfluidics-based computers, analog computers, and wetware computers. This point of view has been adopted by the physics of computation, a branch of theoretical physics, as well as the field of natural computing. An even more radical point of view, pancomputationalism, pancomputationalism (inaudible word), is the postulate of digital physics that argu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Quantitative Linguistics
Quantitative linguistics (QL) is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws. Synergetic linguistics was from its very beginning specifically designed for this purpose. QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics and computational linguistics are other fields which contribute important empirical evidence. History The earliest QL approaches date back in the ancient Greek and ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Language Varieties And Styles
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of methods, including spoken, sign, and written language. Many languages, including the most widely-spoken ones, have writing systems that enable sounds or signs to be recorded for later reactivation. Human language is highly variable between cultures and across time. Human languages have the properties of productivity and displacement, and rely on social convention and learning. Estimates of the number of human languages in the world vary between and . Precise estimates depend on an arbitrary distinction (dichotomy) established between languages and dialects. Natural languages are spoken, signed, or both; however, any language can be encoded into secondary media using auditory, visual, or tactile stimuli – for example, writing, whis ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Military Technology
Military technology is the application of technology for use in warfare. It comprises the kinds of technology that are distinctly military in nature and not civilian in application, usually because they lack useful or legal civilian applications, or are dangerous to use without appropriate military training. The line is porous; military inventions have been brought into civilian use throughout history, with sometimes minor modification if any, and civilian innovations have similarly been put to military use. Military technology is usually researched and developed by scientists and engineers specifically for use in battle by the armed forces. Many new technologies came as a result of the military funding of science. Armament engineering is the design, development, testing and lifecycle management of military weapons and systems. It draws on the knowledge of several traditional engineering disciplines, including mechanical engineering, electrical engineering, mechatronics, el ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


2013 South Korea Cyberattack
In 2013 there were two major sets of cyberattacks on South Korean targets attributed to elements within North Korea. March On 20 March 2013, three South Korean television stations and a bank suffered from frozen computer terminals in a suspected act of cyberwarfare.Tania Branigan"South Korea on alert for cyber-attacks after major network goes down: Computer systems of banks and broadcasters are interrupted, with fingers immediately pointed at North Korea" ''The Guardian'', 20 March 2013. ATMs and mobile payments were also affected. The South Korean communications watchdog Korea Communications Commission raised their alert level on cyber-attacks to three on a scale of five. North Korea has been blamed for similar attacks in 2009 and 2011 and was suspected of launching this attack as well. This attack also came at a period of elevated tensions between the two Koreas, following Pyongyang’s nuclear test on 12 February. South Korean officials linked the incident to a China, Chinese ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Shamoon
Shamoon ( fa, شمعون), also known as W32.DistTrack, is a modular computer virus that was discovered in 2012, targeting then-recent 32-bit NT kernel versions of Microsoft Windows. The virus was notable due to the destructive nature of the attack and the cost of recovery. Shamoon can spread from an infected machine to other computers on the network. Once a system is infected, the virus continues to compile a list of files from specific locations on the system, upload them to the attacker, and erase them. Finally the virus overwrites the master boot record of the infected computer, making it unusable. The virus was used for cyberwarfare against national oil companies including Saudi Arabia's Saudi Aramco and Qatar's RasGas. A group named "Cutting Sword of Justice" claimed responsibility for an attack on 30,000 Saudi Aramco workstations, causing the company to spend more than a week restoring their services. The group later indicated that the Shamoon virus had been used in the att ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Sony Pictures Hack
On November 24, 2014, a hacker group identifying itself as "Guardians of Peace" leaked a release of confidential data from the film studio Sony Pictures Entertainment (SPE). The data included personal information about Sony Pictures employees and their families, emails between employees, information about executive salaries at the company, copies of then-unreleased Sony films, plans for future Sony films, scripts for certain films, and other information. The perpetrators then employed a variant of the Shamoon wiper malware to erase Sony's computer infrastructure. During the hack, the group demanded that Sony withdraw its then-upcoming film ''The Interview'', a comedy about a plot to assassinate North Korean leader Kim Jong-un, and threatened terrorist attacks at cinemas screening the film. After many major U.S. theater chains opted not to screen ''The Interview'' in response to these threats, Sony chose to cancel the film's formal premiere and mainstream release, opting to ski ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Abstract Syntax Tree
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring in the text. The syntax is "abstract" in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural or content-related details. For instance, grouping parentheses are implicit in the tree structure, so these do not have to be represented as separate nodes. Likewise, a syntactic construct like an if-condition-then statement may be denoted by means of a single node with three branches. This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse trees. Parse trees are typically built by a parser during the source code translation and compiling process. Once built, additional information is added to the AST by means of subsequent processing, e.g., co ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Decompilation
A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. Decompilers are usually unable to perfectly reconstruct the original source code, thus frequently will produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software. Introduction The term ''decompiler'' is most commonly applied to a program which translates executable programs (the output from a compiler) into source code in a (relatively) high level language which, when compiled, will produce an executable whose behavior is the same as the original executable program. By comparison, a disassembler translates an executable program into assembly language (and an assembler could be used for assembling it back into an executable program). Decompilation is the act of using a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Pseudocode
In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine reading. It typically omits details that are essential for machine understanding of the algorithm, such as variable declarations and language-specific code. The programming language is augmented with natural language description details, where convenient, or with compact mathematical notation. The purpose of using pseudocode is that it is easier for people to understand than conventional programming language code, and that it is an efficient and environment-independent description of the key principles of an algorithm. It is commonly used in textbooks and scientific publications to document algorithms and in planning of software and other algorithms. No broad standard for pseudocode syntax exists, as a program in pseudocode is not an executa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]