Syntactic Garbage
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, garbage includes
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
,
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ai ...
s, or other regions of the
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
of a computer system (or other system resources), which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to ''deallocate'' memory that is occupied by garbage and return it to the
heap Heap or HEAP may refer to: Computing and mathematics * Heap (data structure), a data structure commonly used to implement a priority queue * Heap (mathematics), a generalization of a group * Heap (programming) (or free store), an area of memory f ...
, or memory pool, for reuse.


Classification

Garbage is generally classified into two types: syntactic garbage, any object or data which is within a program's memory space but unreachable from the program's root set; and semantic garbage, any object or data which is never accessed by a running program for any combination of program inputs. Objects and data which are not garbage are said to be ''live''. Casually stated, syntactic garbage is data that ''cannot'' be reached, and semantic garbage is data that ''will not'' be reached. More precisely, syntactic garbage is data that is unreachable due to the reference graph (there is no path to it), which can be determined by many algorithms, as discussed in
tracing garbage collection In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated ("garbage collected") by tracing which objects are ''reachable'' by a chain of references ...
, and only requires analyzing the data, not the code. Semantic garbage is data that will not be accessed, either because it is unreachable (hence also syntactic garbage), or is reachable but will not be accessed; this latter requires analysis of the code, and is in general an
undecidable problem In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an ...
. Syntactic garbage is a (usually strict) subset of semantic garbage, as it is entirely possible for an object to hold a reference to another object without ever using that object.


Example

In the following simple
stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...
implementation in Java, each element popped from the stack becomes semantic garbage once there are no outside references to it: public class Stack This is because elements[] still contains a reference to the object, but the object ''will never'' be accessed again through this reference, because elements[] is private to the class and the pop method only returns references to elements it has not already popped. (After it decrements size, this class ''will never'' access that element again.) However, knowing this requires analysis of the code of the class, which is undecidable in general. If a later push call re-grows the stack to the previous size, overwriting this last reference, then the object will become syntactic garbage, because it ''can never'' be accessed again, and will be eligible for garbage collection.


Automatic garbage collection

An example of the automatic collection of syntactic garbage, by
reference counting In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. In garbage collection algorithms, referenc ...
garbage collection, can be produced using the
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
command-line interpreter: >>> class Foo: ... """This is an empty testing class.""" ... pass ... >>> bar = Foo() >>> bar <__main__.Foo object at 0x54f30> >>> del bar In this session, an object is created, its location in the memory is displayed, and the only reference to the object is then destroyed—there is no way to ever use the object again from this point on, as there are no references to it. This becomes apparent when we try to access the original reference: >>> bar Traceback (most recent call last): File "", line 1, in ? NameError: name 'bar' is not defined As it is now impossible to refer to the object, the object has become useless; it is garbage. Since Python uses garbage collection, it automatically deallocates the memory that was used for the object so that it may be used again: >>> class Bar: ... """This is another testing class.""" ... pass ... >>> baz = Bar() >>> baz <__main__.Bar object at 0x54f30> The instance now resides at the memory location ; at the same place as where our previous object, the instance, was located. Since the instance was destroyed, freeing up the memory used to contain it, the interpreter creates the object at the same memory location as before, making good use of the available resources.


Effects

Garbage consumes heap memory, and thus one wishes to collect it (to minimize memory use, allow faster memory allocation, and prevent out-of-memory errors by reducing heap fragmentation and memory use). However, collecting garbage takes time and, if done manually, requires coding overhead. Further, collecting garbage destroys objects and thus can cause calls to
finalizer In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to ...
s, executing potentially arbitrary code at an arbitrary point in the program's execution. Incorrect garbage collection (deallocating memory that is not garbage), primarily due to errors in manual garbage collection (rather than errors in garbage collectors), results in
memory safety Memory safety is the state of being protected from various software bugs and Vulnerability (computing), security vulnerabilities when dealing with random-access memory, memory access, such as buffer overflows and dangling pointers. For example, J ...
violations (that often create security holes) due to use of
dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are ...
s. Syntactic garbage can be collected automatically, and garbage collectors have been extensively studied and developed. Semantic garbage cannot be automatically collected in general, and thus causes
memory leak In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that Computer memory, memory which is no longer needed is not released. A memory leak may also happe ...
s even in garbage-collected languages. Detecting and eliminating semantic garbage is typically done using a specialized debugging tool called a heap profiler, which allows one to see which objects are live and how they are reachable, enabling one to remove the unintended reference.


Eliminating garbage

The problem of managing the deallocation of garbage is well-known in computer science. Several approaches are taken: * Many
operating systems An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also inc ...
reclaim the memory and resources used by a process or program when it terminates. Simple or short-lived programs which are designed to run in such environments can exit and allow the operating system to perform any necessary reclamation. * In systems or
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s with
manual memory management In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supp ...
, the programmer must explicitly arrange for memory to be deallocated when it is no longer used. C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
are two well-known languages which support this model. *
Garbage collection Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclable m ...
uses various algorithms to automatically analyze the state of a program, identify garbage, and deallocate it without intervention by the programmer. Many modern programming languages such as
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
and
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lang ...
provide automated garbage collection. However, it is not a recent development, as it has also been used in older languages such as
LISP A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
. * There is ongoing research to type-theoretic approaches (such as
region inference In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that ca ...
) to identification and removal of garbage from a program. No general type-theoretic solution to the problem has been developed.


Notes


External links

* Benjamin Pierce (editor), ''Advanced Topics in Types and Programming Languages'', MIT Press (2005), * Richard Jones and Rafael Lins, ''Garbage Collection: Algorithms for Automated Dynamic Memory Management'', Wiley and Sons (1996), {{Memory management Computer data Computer programming