Language-based Security
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, language-based security (LBS) is a set of techniques that may be used to strengthen the security of applications on a high level by using the properties of programming languages. LBS is considered to enforce computer security on an application-level, making it possible to prevent vulnerabilities which traditional operating system security is unable to handle. Software applications are typically specified and implemented in certain
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, and in order to protect against attacks, flaws and bugs an application's
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
might be vulnerable to, there is a need for application-level security; security evaluating the applications behavior with respect to the programming language. This area is generally known as language-based security.


Motivation

The use of large software systems, such as
SCADA Supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications and graphical user interfaces for high-level supervision of machines and processes. It also covers sensors and ...
, is taking place all around the world and computer systems constitute the core of many infrastructures. The society relies greatly on infrastructure such as water, energy, communication and transportation, which again all rely on fully functionally working computer systems. There are several well known examples of when critical systems fail due to bugs or errors in software, such as ''when shortage of computer memory caused LAX computers to crash and hundreds of flights to be delayed'' (April 30, 2014). Traditionally, the mechanisms used to control the correct behavior of software are implemented at the operating system level. The operating system handles several possible security violations such as memory access violations, stack overflow violations, access control violations, and many others. This is a crucial part of security in computer systems, however by securing the behavior of software on a more specific level, even stronger security can be achieved. Since a lot of properties and behavior of the software is lost in compilation, it is significantly more difficult to detect vulnerabilities in machine code. By evaluating the source code, before the compilation, the theory and implementation of the programming language can also be considered, and more vulnerabilities can be uncovered.
"So why do developers keep making the same mistakes? Instead of relying on programmers' memories, we should strive to produce tools that codify what is known about common security vulnerabilities and integrate it directly into the development process." — D. Evans and D. Larochelle, 2002


Objective of Language-based security

By using LBS, the security of software can be increased in several areas, depending on the techniques used. Common programming errors such as allowing
buffer overflow In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations. Buffers are areas of memo ...
s and illegal information flows to occur, can be detected and disallowed in the software used by the consumer. It is also desirable to provide some proof to the consumer about the security properties of the software, making the consumer able to trust the software without having to receive the source code and self checking it for errors. A compiler, taking source code as input, performs several language specific operations on the code in order to translate it into machine readable code.
Lexical analysis In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
, preprocessing,
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
, semantic analysis, code generation, and
code optimization In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be o ...
are all commonly used operations in compilers. By analyzing the source code and using the theory and implementation of the language, the compiler will attempt to correctly translate the high-level code into low-level code, preserving the behavior of the program. During compilation of programs written in a
type-safe In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
language, such as
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, the source code must type-check successfully before compilation. If the type-check fails, the compilation will not be performed, and the source code needs to be modified. This means that, given a correct compiler, any code compiled from a successfully type-checked source program should be clear of invalid assignment errors. This is information which can be of value to the code consumer, as it provides some degree of guarantee that the program will not crash due to some specific error. A goal of LBS is to ensure the presence of certain properties in the source code corresponding to the safety policy of the software. Information gathered during the compilation can be used to create a certificate that can be provided to the consumer as a proof of safety in the given program. Such a proof must imply that the consumer can trust the compiler used by the supplier and that the certificate, the information about the source code, can be verified. The figure illustrates how certification and verification of low-level code could be established by the use of a certifying compiler. The software supplier gains the advantage of not having to reveal the source code, and the consumer is left with the task of verifying the certificate, which is an easy task compared to evaluation and compilation of the source code itself. Verifying the certificate only requires a limited trusted code base containing the compiler and the verifier.


Techniques


Program analysis

The main applications of
program analysis In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program op ...
are
program optimization In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be o ...
(running time, space requirements, power consumption etc.) and
program correctness In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified. Best explored is ''functional'' correctness, which refers to the input-output behavior of the algorithm (i.e., for each input it p ...
(bugs, security vulnerabilities etc.). Program analysis can be applied to compilation (
static analysis Static analysis, static projection, or static scoring is a simplified analysis wherein the effect of an immediate change to a system is calculated without regard to the longer-term response of the system to that change. If the short-term effect i ...
), run-time ( dynamic analysis), or both. In language-based security, program analysis can provide several useful features, such as:
type checking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
(static and dynamic),
monitoring Monitoring may refer to: Science and technology Biology and healthcare * Monitoring (medicine), the observation of a disease, condition or one or several medical parameters over time * Baby monitoring * Biomonitoring, of toxic chemical compounds, ...
,
taint checking Taint checking is a feature in some computer programming languages, such as Perl, Ruby or Ballerina designed to increase security by preventing malicious users from executing commands on a host computer. Taint checks highlight specific security ri ...
and control-flow analysis.


Information flow analysis

Information flow analysis can be described as a set of tools used to analyze the information flow control in a program, in order to preserve
confidentiality Confidentiality involves a set of rules or a promise usually executed through confidentiality agreements that limits the access or places restrictions on certain types of information. Legal confidentiality By law, lawyers are often required ...
and
integrity Integrity is the practice of being honest and showing a consistent and uncompromising adherence to strong moral and ethical principles and values. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions. Inte ...
where regular
access control In the fields of physical security and information security, access control (AC) is the selective restriction of access to a place or other resource, while access management describes the process. The act of ''accessing'' may mean consuming ...
mechanisms come short.
"By decoupling the right to access information from the right to disseminate it, the flow model goes beyond the access matrix model in its ability to specify secure information flow. A practical system needs both access and flow control to satisfy all security requirements." — D. Denning, 1976
Access control enforces checks on ''access'' to information, but is not concerned about what happens after that. An example: A system has two users, Alice and Bob. Alice has a file ''secret.txt'', which is only allowed to be read and edited by her, and she prefers to keep this information to herself. In the system, there also exists a file ''public.txt'', which is free to read and edit for all users in the system. Now suppose that Alice has accidentally downloaded a malicious program. This program can access the system as Alice, bypassing the access control check on ''secret.txt''. The malicious program then copies the content of ''secret.txt'' and places it in ''public.txt'', allowing Bob and all other users to read it. This constitutes a violation of the intended confidentiality policy of the system.


Noninterference

Noninterference is a property of programs that does not leak or reveal information of variables with a ''higher'' security classification, depending on the input of variables with a ''lower'' security classification. A program which satisfies noninterference should produce the same ''output'' whenever the corresponding same ''input'' on the ''lower'' variables are used. This must hold for every possible value on the input. This implies that even if ''higher'' variables in the program has different values from one execution to another, this should not be visible on the ''lower'' variables. An attacker could try to execute a program which does not satisfy noninterference repeatedly and systematically to try to map its behavior. Several iterations could lead to the disclosure of ''higher'' variables, and let the attacker learn sensitive information about for example the systems state. Whether a program satisfies noninterference or not can be evaluated during compilation assuming the presence of security type systems.


Security type system

A security type system is a kind of
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
that can be used by software developers in order to check the security properties of their code. In a language with security types, the types of variables and expressions relate to the security policy of the application, and programmers may be able to specify the application security policy via type declarations. Types can be used to reason about various kinds of security policies, including authorization policies (as access control or capabilities) and information flow security. Security type systems can be formally related to the underlying security policy, and a security type system is sound if all programs that type-check satisfy the policy in a semantic sense. For example, a security type system for information flow might enforce noninterference, meaning that type checking reveals whether there is any violation of confidentiality or integrity in the program.


Securing low-level code

Vulnerabilities in low-level code are bugs or flaws that will lead the program into a state where further behavior of the program is undefined by the source programming language. The behavior of the low-level program will depend on compiler, runtime system or operating system details. This allows for an attacker to drive the program towards an undefined state, and exploit the behavior of the system. Common exploits of insecure low-level code lets an attacker perform unauthorized reads or writes to memory addresses. The memory addresses can be either random or chosen by the attacker.


Using safe languages

An approach to achieve secure low-level code is to use safe high-level languages. A safe language is considered to be completely defined by its programmers' manual. Any bug that could lead to implementation-dependent behavior in a safe language will either be detected at compile time or lead to a well-defined error behavior at run-runtime. In
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, if accessing an array out of bounds, an exception will be thrown. Examples of other safe languages are C#,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
and Scala.


Defensive execution of unsafe languages

During compilation of an unsafe language run-time checks is added to the low-level code to detect source-level undefined behavior. An example is the use of canaries, which can terminate a program when discovering bounds violations. A downside of using run-time checks such as in
bounds checking In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
is that they impose considerable performance overhead.
Memory protection Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
, such as using non-executable stack and/or heap, can also be seen as additional run-time checks. This is used by many modern operating systems.


Isolated execution of modules

The general idea is to identify sensitive code from application data by analyzing the source code. Once this is done the different data is separated and placed in different modules. When assuming that each module has total control over the sensitive information it contains, it is possible to specify when and how should leave the module. An example is a cryptographic module that can prevent keys from ever leaving the module unencrypted.


Certifying compilation

Certifying compilation is the idea of producing a certificate during compilation of source code, using the information from the high-level programming language semantics. This certificate should be enclosed with the compiled code in order to provide a form of proof to the consumer that the source code was compiled according to a certain set of rules. The certificate can be produced in different ways, e.g. through
Proof-carrying code Proof-carrying code (PCC) is a software mechanism that allows a host system to verify properties about an application via a formal proof that accompanies the application's executable code. The host system can quickly verify the validity of the proo ...
(PCC) or
Typed assembly language In computer science, a typed assembly language (TAL) is an assembly language that is extended to include a method of annotating the datatype of each value that is manipulated by the code. These annotations can then be used by a program (type chec ...
(TAL).


Proof-carrying code

The main aspects of PCC can be summarized in the following steps: # The supplier provides an executable program with various annotations produced by a certifying compiler. # The consumer provides a verification condition, based on a
security policy Security policy is a definition of what it means to ''be secure'' for a system, organization or other entity. For an organization, it addresses the constraints on behavior of its members as well as constraints imposed on adversaries by mechanisms ...
. This is sent to the supplier. # The supplier runs the verification condition in a theorem prover to produce a proof to the consumer that the program in fact satisfies the security policy. # The consumer then runs the proof in a
proof checker In computer science and mathematical logic, a proof assistant or interactive theorem prover is a software tool to assist with the development of formal proofs by human-machine collaboration. This involves some sort of interactive proof editor ...
to verify the proof validity. An example of a certifying compiler is th
Touchstone compiler
that provides a PCC formal proof of type- and memory safety for programs implemented in Java.


Typed assembly language

TAL is applicable to programming languages that make use of a
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
. After compilation, the
object code In computing, object code or object module is the product of a compiler. In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate langua ...
will carry a type annotation that can be checked by an ordinary type checker. The annotation produced here is in many ways similar to the annotations provided by PCC, with some limitations. However, TAL can handle any security policy that may be expressed by the restrictions of the type system, which can include memory safety and control flow, among others.


Seminars


Dagstuhl Seminar 03411
Language-Based Security, October 5 – 10, 2003.


References

{{Reflist


Books

* G. Barthe, B. Grégoire, T. Rezk, ''Compilation of Certificates'', 2008 * Brian Chess and Gary McGraw, ''Static Analysis for Security'', 2004.


Further reading

* Dexter Kozen
''Language Based Security''
Cornell University, 1999 * Pieter Agten et al.
''Recent Developments in Low-Level Software Security''
Universiteit Leuven * Andrei Sabelfeld and Andrew C. Myers
''Language-Based Information-Flow Security''
* Fred B. Schneider et al.
''A Language-Based Approach to Security''
Carnegie Mellon University, 2000 Computer security