In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a purely functional data structure is a
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
that can be implemented in a
purely functional language
In computer science, purely functional programming usually designates a programming paradigm—a style of building the structure and elements of computer programs—that treats all computation as the evaluation of mathematical functions.
Program ...
. The main difference between an arbitrary data structure and a purely functional one is that the latter is (strongly)
immutable. This restriction ensures the data structure possesses the advantages of immutable objects: (full)
persistency,
quick copy of objects, and
thread safety Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without un ...
. Efficient purely functional data structures may require the use of
lazy evaluation
In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations (sharing).
The b ...
and
memoization.
Definition
Persistent data structure
In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (v ...
s have the property of keeping previous versions of themselves unmodified. On the other hand, structures such as
arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
admit a destructive update,
[''Purely functional data structures''](_blank)
by Chris Okasaki Chris Okasaki, Ph.D. is an associate professor of computer science at the United States Military Academy. He authored ''Purely Functional Data Structures'' (1998), based on a doctoral dissertation of the same name. He obtained a Ph.D. at Carnegie M ...
, Cambridge University Press
Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press
A university press is an academic publishing hou ...
, 1998, that is, an update which cannot be reversed. Once a program writes a value in some index of the array, its previous value can not be retrieved anymore.
Formally, a ''purely functional data structure'' is a data structure which can be implemented in a
purely functional language
In computer science, purely functional programming usually designates a programming paradigm—a style of building the structure and elements of computer programs—that treats all computation as the evaluation of mathematical functions.
Program ...
, such as
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
. In practice, it means that the data structures must be built using only persistent data structures such as tuples,
sum type
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...
s,
product type
In programming languages and type theory, a product of ''types'' is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the prod ...
s, and basic types such as integers, characters, strings. Such a data structure is necessarily persistent. However, not all persistent data structures are purely functional. For example, a
persistent array is a data-structure which is persistent and which is implemented using an array, thus is not purely functional.
In the book ''Purely functional data structures'', Okasaki compares destructive updates to master chef's knives. Destructive updates cannot be undone, and thus they should not be used unless it is certain that the previous value is not required anymore. However, destructive updates can also allow efficiency that can not be obtained using other techniques. For example, a data structure using an array and destructive updates may be replaced by a similar data structure where the array is replaced by a
map
A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes.
Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
, a random access list, or a
balanced tree
In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
, which admits a purely functional implementation. But the access cost may increase from constant time to
logarithmic time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
.
Ensuring that a data structure is purely functional
A data structure is never inherently functional. For example, a stack can be implemented as a
singly-linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...
. This implementation is purely functional as long as the only operations on the stack return a new stack without altering the old stack. However, if the language is not purely functional, the run-time system may be unable to guarantee immutability. This is illustrated by Okasaki, where he shows the concatenation of two singly-linked lists can still be done using an imperative setting.
In order to ensure that a data structure is used in a purely functional way in an impure functional language,
modules
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a s ...
or
classes can be used to ensure manipulation via authorized functions only.
Using Purely Functional Data Structures
One of the central challenges in adapting existing code to use purely functional data structures lies in the fact that mutable data
structures provide "hidden outputs" for functions that use them. Rewriting these functions to use purely functional data structures
requires adding these data structures as explicit outputs.
For instance, consider a function that accepts a mutable list, removes the first element from the list, and returns that element. In a purely functional setting, removing an element from the list produces a new and shorter list, but does not update the original one. In order to be useful, therefore, a purely functional version of this function is likely to have to return the new list along with the removed element. In the most general case, a program converted in this way must return the "state" or "store" of the program as an additional result from every function call. Such a program is said to be written in
store-passing style
Store-passing style is a programming technique that is used to model mutable state without using mutable state. It generally arises in the conversion of imperative programs into purely functional ones.
So, for instance, consider this JavaScript ...
.
Examples
Here is a list of abstract data structures with purely functional implementations:
* Stack (first in, last out) implemented as a
singly linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which ...
,
* Queue, implemented as a
real-time queue
In computer science, a queue is a collection (abstract data type), collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end ...
,
* Double-ended queue, implemented as a
real-time double-ended queue,
*
(Multi)set of ordered elements and
map
A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes.
Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
indexed by ordered keys, implemented as a
red–black tree, or more generally by a
search tree
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less ...
,
*
Priority queue, implemented as a
Brodal queue In computer science, the Brodal queue is a heap/priority queue structure with very low worst case
In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on aver ...
* Random access list, implemented as a skew-binary random access list
*
Hash consing In computer science, particularly in functional programming, hash consing is a technique used to share values that are structurally equal. The term ''hash consing'' originates from implementations of Lisp that attempt to reuse cons cells that have ...
*
Zipper (data structure)
A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was ...
Design and implementation
In his book ''Purely Functional Data Structures'', computer scientist
Chris Okasaki Chris Okasaki, Ph.D. is an associate professor of computer science at the United States Military Academy. He authored ''Purely Functional Data Structures'' (1998), based on a doctoral dissertation of the same name. He obtained a Ph.D. at Carnegie M ...
describes techniques used to design and implement purely functional data structures, a small subset of which are summarized below.
Laziness and memoization
Lazy evaluation is particularly interesting in a purely functional language because the order of the evaluation never changes the result of a function. Therefore, lazy evaluation naturally becomes an important part of the construction of purely functional data structures. It allows a computation to be done only when its result is actually required. Therefore, the code of a purely functional data structure can, without loss of efficiency, consider similarly data that will effectively be used and data that will be ignored. The only computation required is for the first kind of data; that is what will actually be performed.
One of the key tools in building efficient, purely functional data structures is memoization. When a computation is done, it is saved and does not have to be performed a second time. This is particularly important in lazy implementations; additional evaluations may require the same result, but it is impossible to know which evaluation will require it first.
Amortized analysis and scheduling
Some data structures, even those that are not purely functional such as
dynamic arrays, admit operations that are efficient most of the time (e.g., constant time for dynamic arrays), and rarely inefficient (e.g., linear time for dynamic arrays). ''
Amortization
Amortization or amortisation may refer to:
* The process by which loan principal decreases over the life of an amortizing loan
* Amortization (accounting), the expensing of acquisition cost minus the residual value of intangible assets in a system ...
'' can then be used to prove that the average running time of the operations is efficient. That is to say, the few inefficient operations are rare enough, and do not change the asymptotical evolution of time complexity when a sequence of operations is considered.
In general, having inefficient operations is not acceptable for persistent data structures, because this very operation can be called many times. It is not acceptable either for real-time or for imperative systems, where the user may require the time taken by the operation to be predictable. Furthermore, this unpredictability complicates the use of
parallelism.
In order to avoid those problems, some data structures allow for the inefficient operation to be postponed—this is called
scheduling
A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are ...
. The only requirement is that the computation of the inefficient operation should end before its result is actually needed. A constant part of the inefficient operation is performed simultaneously with the following call to an efficient operation, so that the inefficient operation is already totally done when it is needed, and each individual operation remains efficient.
Example: queue
Amortized queues are composed of two singly-linked lists: the front and the reversed rear. Elements are added to the rear list and are removed from the front list. Furthermore, whenever the front queue is empty, the rear queue is reversed and becomes the front, while the rear queue becomes empty. The amortized time complexity of each operation is constant. Each cell of the list is added, reversed and removed at most once. In order to avoid an inefficient operation where the rear list is reversed,
real-time queue
In computer science, a queue is a collection (abstract data type), collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end ...
s add the restriction that the rear list is only as long as the front list. To ensure that the rear list becomes longer than the front list, the front list is appended to the rear list and reversed. Since this operation is inefficient, it is not performed immediately. Instead, it is spread out over the subsequent operations. Thus, each cell is computed before it is needed, and the new front list is totally computed before a new inefficient operation needs to be called.
See also
*
Persistent data structure
In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (v ...
References
{{reflist
External links
Purely Functional Data Structuresthesis by Chris Okasaki (PDF format)
Making Data-Structures Persistentby James R. Driscoll, Neil Sarnak, Daniel D. Sleator, Robert E. Tarjan (PDF)
Fully Persistent Lists with Catenationby James R. Driscoll, Daniel D. Sleator, Robert E. Tarjan (PDF)
Persistent Data Structuresfrom the
MIT OpenCourseWare
MIT OpenCourseWare (MIT OCW) is an initiative of the Massachusetts Institute of Technology (MIT) to publish all of the educational materials from its undergraduate- and graduate-level courses online, freely and openly available to anyone, anyw ...
cours
Advanced AlgorithmsWhat's new in purely functional data structures since Okasaki?on ''Theoretical Computer Science
Stack Exchange
Stack Exchange is a network of question-and-answer (Q&A) websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows th ...
''
Functional data structures
Functional programming