Zipping (computer Science)
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, zipping is a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...
which
maps A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
a
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
of
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s into a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s. This name zip derives from the action of a
zipper A zipper (N. America), zip, zip fastener (UK), formerly known as a clasp locker, is a commonly used device for binding together two edges of textile, fabric or other flexible material. Used in clothing (e.g. jackets and jeans), luggage and oth ...
in that it interleaves two formerly disjoint sequences. The inverse function is ''unzip''.


Example

Given the three words ''cat'', ''fish'' and ''be'' where , ''cat'', is 3, , ''fish'', is 4 and , ''be'', is 2. Let \ell denote the length of the longest word which is ''fish''; \ell = 4. The zip of ''cat'', ''fish'', ''be'' is then 4 tuples of elements: : (c,f,b)(a,i,e)(t,s,\#)(\#,h,\#) where ''#'' is a symbol not in the original alphabet. In
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
this truncates to the shortest sequence \underline, where \underline = 2: zip3 "cat" "fish" "be" -- 'c','f','b'),('a','i','e')


Definition

Let Σ be an
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
, # a symbol not in Σ. Let ''x''1''x''2... ''x'', ''x'', , ''y''1''y''2... ''y'', ''y'', , ''z''1''z''2... ''z'', ''z'', , ... be ''n''
words A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
(i.e. finite
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s) of elements of Σ. Let \ell denote the length of the longest word, i.e. the maximum of , ''x'', , , ''y'', , , ''z'', , ... . The zip of these words is a finite sequence of ''n''-tuples of elements of , i.e. an element of ((\Sigma\cup\)^n)^*: : (x_1,y_1,\ldots)(x_2,y_2,\ldots)\ldots(x_\ell,y_\ell,\ldots), where for any index , the ''wi'' is #. The zip of ''x, y, z, ...'' is denoted zip(''x, y, z, ...'') or ''x'' ⋆ ''y'' ⋆ ''z'' ⋆ ... The inverse to zip is sometimes denoted unzip. A variation of the zip operation is defined by: : (x_1,y_1,\ldots)(x_2,y_2,\ldots)\ldots(x_,y_,\ldots) where \underline is the ''minimum'' length of the input words. It avoids the use of an adjoined element \#, but destroys information about elements of the input sequences beyond \underline.


In programming languages

Zip functions are often available in
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s, often referred to as . In
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
-dialects one can simply the desired function over the desired lists, is
variadic In computer science, an operator or function is variadic if it can take a varying number of arguments; that is, if its arity is not fixed. For specific articles, see: * Variadic function * Variadic macro in the C preprocessor * Variadic template ...
in Lisp so it can take an arbitrary number of lists as argument. An example from
Clojure Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
: ;; `nums' contains an infinite list of numbers (0 1 2 3 ...) (def nums (range)) (def tens 0 20 30 (def firstname "Alice") ;; To zip (0 1 2 3 ...) and 0 20 30into a vector, invoke `map vector' on them; same with list (map vector nums tens) ; ⇒ ( 10 20 30 (map list nums tens) ; ⇒ ((0 10) (1 20) (2 30)) (map str nums tens) ; ⇒ ("010" "120" "230") ;; `map' truncates to the shortest sequence; note missing \c and \e from "Alice" (map vector nums tens firstname) ; ⇒ ( 10 \A 20 \l 30 \i (map str nums tens firstname) ; ⇒ ("010A" "120l" "230i") ;; To unzip, apply `map vector' or `map list' (apply map list (map vector nums tens firstname)) ;; ⇒ ((0 1 2) (10 20 30) (\A \l \i)) In
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
: (defparameter nums '(1 2 3)) (defparameter tens '(10 20 30)) (defparameter firstname "Alice") (mapcar #'list nums tens) ;; ⇒ ((1 10) (2 20) (3 30)) (mapcar #'list nums tens (coerce firstname 'list)) ;; ⇒ ((1 10 #\A) (2 20 #\l) (3 30 #\i)) — truncates on shortest list ;; Unzips (apply #'mapcar #'list (mapcar #'list nums tens (coerce firstname 'list))) ;; ⇒ ((1 2 3) (10 20 30) (#\A #\l #\i)) Languages such as
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
provide a function.map(function, iterable, ...)
from section Built-in Functions from Python v2.7.2 documentation
in conjunction with the operator unzips a list: >>> nums =
, 2, 3 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
>>> tens = 0, 20, 30>>> firstname = 'Alice' >>> zipped = list(zip(nums, tens)) >>> zipped 1, 10), (2, 20), (3, 30) >>> list(zip(*zipped)) # unzip 1, 2, 3), (10, 20, 30) >>> zipped2 = list(zip(nums, tens, list(firstname))) >>> zipped2 # zip, truncates on shortest 1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i') >>> list(zip(*zipped2)) # unzip 1, 2, 3), (10, 20, 30), ('A', 'l', 'i')
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
has a method of zipping sequences but requires a specific function for each
arity In logic, mathematics, and computer science, arity () is the number of arguments or operands taken by a function, operation or relation. In mathematics, arity may also be called rank, but this word can have many other meanings. In logic and ...
( for two sequences, for three etc.),zip :: ">zip :: [a
-> [b-> [(a, b)">
-> [b">">zip :: [a
-> [b-> [(a, b)/nowiki>">">">zip :: [a
-> [b-> [(a, b)">
-> [b">">zip :: [a
-> [b-> [(a, b)/nowiki>from Prelude, Basic libraries
similarly the functions and are available for unzipping: -- nums contains an infinite list of numbers [1, 2, 3, ...] nums = [1..] tens = 0, 20, 30firstname = "Alice" zip nums tens -- ⇒ [(1,10), (2,20), (3,30)] — zip, truncates infinite list unzip $ zip nums tens -- ⇒ (
,2,3 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
0,20,30 — unzip zip3 nums tens firstname -- ⇒ 1,10,'A'), (2,20,'l'), (3,30,'i')— zip, truncates unzip3 $ zip3 nums tens firstname -- ⇒ (
,2,3 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
0,20,30 "Ali") — unzip


Language comparison

List of languages by support of zip:


See also

{{Portal, Computer programming *
Map (higher-order function) In many programming languages, map is a higher-order function that applies a given function to each element of a collection, e.g. a list or set, returning the results in a collection of the same type. It is often called ''apply-to-all'' wh ...


References

Articles with example Haskell code Articles with example Lisp (programming language) code Articles with example Clojure code Articles with example Python (programming language) code Data mapping