Zipper (data Structure)
   HOME

TheInfoList



OR:

A zipper is a technique of representing an aggregate
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in
purely functional programming language In computer science, purely functional programming usually designates a programming paradigm—a style of building the structure and elements of computer programs—that treats all computation as the evaluation of mathematical functions. Program s ...
s. The zipper was described by
Gérard Huet Gérard Pierre Huet (; born 7 July 1947) is a French computer scientist, linguist and mathematician. He is senior research director at INRIA and mostly known for his major and seminal contributions to type theory, programming language theory and ...
in 1997. It includes and generalizes the
gap buffer A gap buffer in computer science is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the cu ...
technique sometimes used with arrays. The zipper technique is general in the sense that it can be adapted to
lists A list is a set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of the list-maker, but ...
,
trees In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only p ...
, and other recursively defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation. A layperson's explanation for a tree with zipper would be an ordinary computer file system with operations to go to parent (often cd ..), and to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes, zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure).


Example: Bidirectional list traversal

Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations. These include the structure of finite lists, which can be generated by two operations: * Empty constructs an empty list, * Cons(x, L) constructs a list by prepending or concatenating value x in front of list L. A list such as
, 2, 3 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/code> is therefore the declaration Cons(1, Cons(2, Cons(3, Empty))). It is possible to describe the location in such a list as the number of steps from the front of the list to the target location. More formally, a location in the list is the number of Cons operations required to reconstruct the whole list from that particular location. For example, in Cons(1, Cons(2, Cons( X, Cons(4, Empty)))) a Cons(2, L) and a Cons(1, L) operation would be required to reconstruct the list relative to position X otherwise known as Cons( X, Cons(4, Empty)). This recording together with the location is called a zipped representation of the list or a list-zipper. To be clear, a location in the list is not just the number of Cons operations, but also all of the other information about those Cons; in this case, the values that must be reconnected. Here, these values may be conveniently represented in a separate list in the order of application from the target location. Specifically, from the context of "3" in the list
, 2, 3, 4 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/code>, a recording (commonly referred to as a 'path') could be represented as
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/code> where Cons(2, L) is applied followed by (Cons 1, L) to reconstitute the original list starting from
, 4 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/code>. A list-zipper always represents the entire data structure. However, this information is from the perspective of a specific location within that data structure. Consequently, a list-zipper is a pair consisting of both the location as a context or starting point, and a recording or path that permits reconstruction from that starting location. In particular, the list-zipper of
, 2, 3, 4 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/code> at the location of "3" may be represented as (
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
, 4 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
. Now, if "3" is changed to "10", then the list-zipper becomes (
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
0, 4
. The list may then be efficiently reconstructed: , 2, 10, 4/code> or other locations traversed to. With the list represented this way, it is easy to define relatively efficient operations on immutable data structures such as Lists and Trees at arbitrary locations. In particular, applying the zipper transform to a tree makes it easy to insert or remove values at any particular location in the tree.


Contexts and differentiation

The type of a zipper's contexts can be constructed via an operation on the original type that is closely related to the
derivative In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
of
calculus Calculus is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithmetic operations. Originally called infinitesimal calculus or "the ...
through
decategorification In mathematics, categorification is the process of replacing set-theoretic theorems with category-theoretic analogues. Categorification, when done successfully, replaces sets with categories, functions with functors, and equations with natural is ...
. The recursive types that zippers are formed from can be viewed as the
least fixed point In order theory, a branch of mathematics, the least fixed point (lfp or LFP, sometimes also smallest fixed point) of a function from a partially ordered set ("poset" for short) to itself is the fixed point which is less than each other fixed po ...
of a unary type constructor of kind * \rightarrow *. For example, with a higher-order type constructor \text : (* \rightarrow *) \rightarrow * that constructs the least fixed point of its argument, an unlabeled binary tree can be represented as \text(T \mapsto T^2 + 1) and an unlabeled list may take the form \text(T \mapsto T + 1). Here, the notation of exponentiation, multiplication, and addition correspond to
function type In computer science and mathematical logic, a function type (or arrow type or exponential) is the type of a variable or parameter to which a function has or can be assigned, or an argument or result type of a higher-order function taking or return ...
s,
product type In programming languages and type theory, a product of ''types'' is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the produ ...
s, and
sum type In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. O ...
s respectively, whilst the natural numbers label the finite types; in this way, the type constructors resemble polynomial functions. The derivative of a type constructor can therefore be formed through this syntactic analogy, and the zipper of the type constructor is the derivative paired with its element type. In this way, the derivative can be viewed as a zipper with a hole in it, where a value could be placed. Consider the type of singly-linked lists. The list data structure can be defined as L(X) = 1 + X \times L(X). That is, L is a type constructor which takes an element type X and produces the type of lists of that element type. The two addends correspond to the two data constructors for a list. The Nil data constructor, a
sentinel node In computer programming, a sentinel node is a specifically designated node used with linked lists and trees as a traversal path terminator. This type of node does not hold or reference any data managed by the data structure. Benefits Sentinel ...
that holds no data, is represented by the
unit type In the area of mathematical logic and computer science known as type theory, a unit type is a type that allows only one value (and thus can hold no information). The carrier (underlying set) associated with a unit type can be any singleton set. ...
1, and the Cons constructor is represented as a product of the head element X and the tail L of the list. Under this representation, the derivative L' of L in terms of X is L'(X) = L(X) + X \times L'(X). That is, if we have a zipper-with-a-hole for a list, then either (I) the hole is the very first element, and the rest of the list is an ordinary (non-zipper) list L(X), or (II) the first element is present (X) and the hole is somewhere else further down the tail of the list (L'(X)). Then the formal type of a zipper for a linked list is X \times L'(X), or X \times (L(X) + X \times L'(X)). Put another way, a zipper for a linked list consists of an element in that list and instructions on where to put it in a partially-built list. For another example, consider the recursive data structure of a binary tree with nodes that are either sentinel nodes of type 1 or which are leaves containing a value of some type X. We can represent this type algebraically as T(X) = 1 + X \times T^2(X). The derivative in terms of X is T'(X) = T^2(X) + 2 \times X \times T(X) \times T'(X). We can read this algebraic notation as a zipper with a hole: A zipper for a tree either has the "missing value" at the very root of the tree, leaving the two branches as ordinary trees (the T^2(X) case), or it has the missing value on one of the two branches (2 \times X \times T(X) \times T'(X)). In the latter case, the Boolean type 2 indicates which branch (left or right) contains the hole, and the zipper contains a value for the root, a T(X) for the complete branch, and a T'(X) for the branch that's missing a value of type X. The complete type of a zipper for this binary tree structure, then, is X \times T'(X), or X \times (T^2(X) + 2 \times X \times T(X) \times T'(X)). In general, a zipper consists of two parts: a collection of contexts for the current node and each of its ancestors up until the root node, and the value that the current node contains.


Uses

The zipper is often used where there is some concept of
focus Focus (: foci or focuses) may refer to: Arts * Focus or Focus Festival, former name of the Adelaide Fringe arts festival in East Australia Film *Focus (2001 film), ''Focus'' (2001 film), a 2001 film based on the Arthur Miller novel *Focus (2015 ...
or a
cursor Cursor may refer to: * Cursor (code editor), an AI powered integrated development environment * Cursor (user interface), an indicator used to show the current position for user interaction on a computer monitor or other display device * Cursor (da ...
used to navigate around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner. The zipper has been used in *
Xmonad xmonad is a dynamic window manager ( tiling) for the X Window System, noted for being written in the functional programming language Haskell. Window manager Begun in March 2007, version 0.1 was announced in April 2007 as 500 lines of Haske ...
, to manage focus and placement of
windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
* Huet's papers cover a
structural editor A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, c ...
based on zippers and a theorem prover * A filesystem (ZipperFS) written in
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references." *
Clojure Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
has extensive support for zippers.


Alternatives and extensions


Direct modification

In a non-purely-functional programming language, it may be more convenient to simply traverse the original data structure and modify it in place (perhaps after deep cloning it, to avoid affecting other code that might hold a reference to it).


Generic zipper

The generic zipper is a technique to achieve the same goal as the conventional zipper by capturing the state of the traversal in a continuation while visiting each node. (The Haskell code given in the reference uses
generic programming Generic programming is a style of computer programming in which algorithms are written in terms of data types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneer ...
to generate a traversal function for any data structure, but this is optional – any suitable traversal function can be used.) However, the generic zipper involves
inversion of control In software engineering, inversion of control (IoC) is a design principle in which custom-written portions of a computer program receive the flow of control from an external source (e.g. a framework). The term "inversion" is historical: a softw ...
, so some uses of it require a
state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
(or equivalent) to keep track of what to do next.


References


Further reading

* *


External links

{{Wikibooks, Haskell, Zippers
Zipper

Theseus and the Zipper

"Roll Your Own Window Manager: Tracking Focus with a Zipper"






Functional programming Functional data structures