HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, a group of parallel arrays (also known as structure of arrays or SoA) is a form of
implicit data structure In computer science, an implicit data structure or space-efficient data structure is a data structure that stores very little information other than the main or required data: a data structure that requires low overhead. They are called "implicit" ...
that uses multiple
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
s to represent a singular array of
records A record, recording or records may refer to: An item or collection of data Computing * Record (computer science), a data structure ** Record, or row (database), a set of fields in a database related to one entity ** Boot sector or boot record, ...
. It keeps a separate, homogeneous data array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory (also known as array of structures or AoS). For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.


Examples

An example in C using parallel arrays: int ages[] = ; char *names[] = ; int parent[] = ; for (i = 1; i <= 4; i++) in Perl (using a hash of arrays to hold references to each array): my %data = ( first_name => Joe', 'Bob', 'Frank', 'Hans' last_name => Smith','Seger','Sinatra','Schultze' height_in_cm => 69, 158, 201, 199 ; for $i (0..$#) Or, in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
: first_names = Joe", "Bob", "Frank", "Hans" last_names = Smith","Seger","Sinatra","Schultze"heights_in_cm = 69, 158, 201, 199 for i in range(len(first_names)): print("Name: %s %s" % (first_names last_names ) print("Height in cm: %s" % heights_in_cm # Using zip: for first_name, last_name, height_in_cm in zip(first_names, last_names, heights_in_cm): print(f"Name: ") print(f"Height in cm: ")


Pros and cons

Parallel arrays have a number of practical advantages over the normal approach: * They can save a substantial amount of space in some cases by avoiding alignment issues. For example, some architectures work best if 4-byte integers are always stored beginning at memory locations that are multiple of 4. If the previous field was a single byte, 3 bytes might be wasted. Many modern compilers can automatically avoid such problems, though in the past some programmers would explicitly declare fields in order of decreasing alignment restrictions. * If the number of items is small, array indices can occupy significantly less space than full pointers, particularly on some architectures. * Sequentially examining a single field of each record in the array is very fast on modern machines, since this amounts to a linear traversal of a single array, exhibiting ideal
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and cache behaviour. * They may allow efficient processing with
SIMD instruction In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s in certain
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s Several of these advantage depend strongly on the particular programming language and implementation in use. However, parallel arrays also have several strong disadvantages, which serves to explain why they are not generally preferred: * They have significantly worse
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
when visiting the records non-sequentially and examining multiple fields of each record, because the various arrays may be stored arbitrarily far apart. * They obscure the relationship between fields of a single record (e.g. no type information relates the index between them, an index may be used erroneously). * They have little direct language support (the language and its syntax typically express no relationship between the arrays in the parallel array, and cannot catch errors). * Since the bundle of fields is not a "thing", passing it around is tedious and error-prone. For example, rather than calling a function to do something to one record (or structure or object), the function must take the fields as separate arguments. When a new field is added or changed, many parameter lists must change, where passing objects as whole would avoid such changes entirely. * They are expensive to grow or shrink, since each of several arrays must be reallocated. Multi-level arrays can ameliorate this problem, but impacts performance due to the additional indirection needed to find the desired elements. * Perhaps worst of all, they greatly raise the possibility of errors. Any insertion, deletion, or move must always be applied consistently to all of the arrays, or the arrays will no longer be synchronized with each other, leading to bizarre outcomes. The bad locality of reference can be alleviated in some cases: if a structure can be divided into groups of fields that are generally accessed together, an array can be constructed for each group, and its elements are records containing only these subsets of the larger structure's fields. (see data oriented design). This is a valuable way of speeding up access to very large structures with many members, while keeping the portions of the structure tied together. An alternative to tying them together using array indexes is to use
reference Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. It is called a ''name'' ...
s to tie the portions together, but this can be less efficient in time and space. Another alternative is to use a single array, where each entry is a record structure. Many language provide a way to declare actual records, and arrays of them. In other languages it may be feasible to simulate this by declaring an array of n*m size, where m is the size of all the fields together, packing the fields into what is effectively a record, even though the particular language lacks direct support for records. Some
compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s, particularly for
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
s, are able to perform this transformation automatically when arrays of structures are created in the program.


See also

* An example in the linked list article *
Column-oriented DBMS A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to r ...


References

*
Thomas H. Cormen Thomas H. Cormen is the co-author of ''Introduction to Algorithms'', along with Charles Leiserson, Ron Rivest, and Cliff Stein. In 2013, he published a new book titled '' Algorithms Unlocked''. He is a professor of computer science at Dartmout ...
,
Charles E. Leiserson Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof. As part of this effort, he developed the Cilk multithreaded language. ...
,
Ronald L. Rivest Ronald Linn Rivest (; born May 6, 1947) is a cryptographer and an Institute Professor at MIT. He is a member of MIT's Department of Electrical Engineering and Computer Science (EECS) and a member of MIT's Computer Science and Artificial Inte ...
, and
Clifford Stein Clifford Seth Stein (born December 14, 1965), a computer scientist, is a professor of industrial engineering and operations research at Columbia University in New York, NY, where he also holds an appointment in the Department of Computer Scie ...
. ''
Introduction to Algorithms ''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is co ...
'', Second Edition. MIT Press and McGraw-Hill, 2001. . Page 209 of section 10.3: Implementing pointers and objects. * {{cite web, url=http://codeblog.jonskeet.uk/2014/06/03/anti-pattern-parallel-collections/, title=Anti-pattern: parallel collections, last=Skeet, first=Jon, date=3 June 2014, accessdate=28 October 2014 Arrays Articles with example C code Articles with example Perl code Articles with example Python (programming language) code