HOME

TheInfoList




In computing, serialization (US spelling) or serialisation (UK spelling) is the process of translating a
data structure In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of ...

data structure
or
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Entity, something that is tangible and within the grasp of the senses ** Object (abstract), an object which does not exist at any particular time or pl ...
state into a format that can be stored (for example, in a
file File or filing may refer to: Mechanical tools and processes * File (tool) A file is a tool used to remove fine amounts of material from a workpiece. It is common in woodworking, metalworking, and other similar trade and hobby tasks. Most are ...
or memory
data buffer In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , ...
) or transmitted (for example, over a
computer network A computer network is a set of computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operati ...
) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of
references Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. It is called a ''name'' ...
, this process is not straightforward. Serialization of object-oriented
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Entity, something that is tangible and within the grasp of the senses ** Object (abstract), an object which does not exist at any particular time or pl ...
s does not include any of their associated methods with which they were previously linked. This process of serializing an object is also called marshalling an object in some situations. The opposite operation, extracting a data structure from a series of bytes, is deserialization, (also called unserialization or unmarshalling).


Uses

Methods of: * transferring data through the wires (
messaging A message is a discrete unit of communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, groups through the use of suffi ...

messaging
). * storing data (in
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...

database
s, on
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device On a reel-to-reel tape recorder (Sony TC-630), the recorder is data storage equipment and the magnetic tape is a data stora ...

hard disk drive
s). *
remote procedure call#REDIRECT remote procedure call In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure ( subroutine) to execute in a different address space (commonly on another computer on a shared network), which is ...
s, e.g., as in
SOAP Soap is a salt (chemistry), salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, ...

SOAP
. * distributing objects, especially in
component-based software engineering Component-based software engineering (CBSE), also called component-based development (CBD), is a branch of software engineering that emphasizes the separation of concerns In computer science Computer science deals with the theoretical founda ...
such as
COM Com or COM may refer to: Computing * COM (hardware interface) COM port ( DE-9 connector). COM (communication port) is the original, yet still common, name of the serial port In computing Computing is any goal-oriented activity requirin ...
,
CORBA The Common Object Request Broker Architecture (CORBA) is a Standardization, standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collabora ...
, etc. * detecting changes in time-varying data. For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of
endianness In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm of an algorithm (Euclid's algorithm) for calculating the greatest comm ...
. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture-independent format means preventing the problems of byte ordering, memory layout, or simply different ways of representing data structures in different
programming language A programming language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (computer science) ...

programming language
s. Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization. Even on a single machine, primitive pointer objects are too fragile to save because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called '' unswizzling'' or ''pointer unswizzling'', where direct pointer references are converted to references based on name or position. The deserialization process includes an inverse step called ''
pointer swizzling In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , ...
''. Since both serializing and deserializing can be driven from common code (for example, the ''Serialize'' function in
Microsoft Foundation Classes Microsoft Foundation Class Library (MFC) is a C++ Object-oriented programming, object-oriented Library (computer science), library for developing desktop applications for Windows. MFC was introduced by Microsoft in 1992 and quickly gained widespr ...
), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly, a technique called differential execution. This is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.


Drawbacks

Serialization breaks the opacity of an
abstract data type In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , ...
by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate
encapsulation Encapsulation may refer to: Chemistry * Molecular encapsulation, in chemistry, the confinement of an individual molecule within a larger molecule * Micro-encapsulation, in material science, the coating of microscopic particles with another materi ...
. To discourage competitors from making compatible products, publishers of
proprietary software Proprietary software, also known as non-free software or closed-source software, is computer software for which the software's publisher or another person reserves some rights from licenses to use, modify, share modifications, or share the softwa ...
often keep the details of their programs' serialization formats a
trade secret Trade secrets are a type of intellectual property that comprise formulas, best practice, practices, business process, processes, designs, legal instrument, instruments, patterns, or compilations of information that have inherent economic value be ...
. Some deliberately
obfuscate Obfuscation is the wikt:obscure#Verb, obscuring of the intended meaning (linguistics), meaning of communication by making the message difficult to understand, usually with mental confusion, confusing and ambiguity, ambiguous language. The obfuscati ...
or even
encrypt In cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia ''-logy'' is a suffix In linguistics Linguistics is the science, scientific study of la ...

encrypt
the serialized data. Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, remote method call architectures such as
CORBA The Common Object Request Broker Architecture (CORBA) is a Standardization, standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collabora ...
define their serialization formats in detail. Many institutions, such as archives and libraries, attempt to future proof their
backup In information technology, a backup, or data backup is a copy of computer data In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of a ...

backup
archives—in particular,
database dump A database dump (also: SQL dump) contains a record of the table structure and/or the data from a database and is usually in the form of a list of SQL statements. A database dump is most often used for backing up a database so that its contents ...
s—by storing them in some relatively
human-readable 220px, ISBN represented as EAN-13 bar code showing both human-readable and machine-readable data A human-readable medium or human-readable format is any encoding of data Data are units of information Information can be thought of as ...
serialized format.


Serialization formats

The
Xerox Network Systems Xerox Network Systems (XNS) is a computer network A computer network is a set of s sharing resources located on or provided by . The computers use common s over to communicate with each other. These interconnections are made up of techno ...
Courier technology in the early 1980s influenced the first widely adopted standard.
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations automatically. M ...
published the
External Data Representation External Data Representation (XDR) is a standard data serialization format, for uses such as computer network A computer network is a group of computers that use a set of common communication protocols over digital signal, digital intercon ...
(XDR) in 1987. XDR is an
open format An open format is a file format A file format is a way that information is encoded for storage in a . It specifies how s are used to encode information in a digital storage medium. File formats may be either or and may be either unpublis ...
, and standardized a
STD 67
(RFC 4506). In the late 1990s, a push to provide an alternative to the standard serialization protocols started:
XML Extensible Markup Language (XML) is a markup language #REDIRECT Markup language In computer text processing, a markup language is a system for annotation, annotating a document in a way that is Syntax (logic), syntactically distinguishable fro ...

XML
, an
SGML The Standard Generalized Markup Language (SGML; ISO The International Organization for Standardization (ISO ) is an international standard An international standard is a technical standard A technical standard is an established norm (social) ...

SGML
subset, was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in
Ajax Ajax (also AJAX ; short for "Asynchronous JavaScript JavaScript (), often abbreviated JS, is a programming language A programming language is a formal language comprising a Instruction set architecture, set of instructions that produ ...
web applications. XML is an open format, and standardized as
W3C recommendation
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard An open standard is a standard Standard may refer to: Flags * Colours, standards and guidons * Standard (flag), a type of flag used for personal identification Norm ...
, is a lighter plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on
JavaScript syntax The Syntax (programming languages), syntax of JavaScript is the set of rules that define a correctly structured JavaScript program. The examples below make use of the log function of the console object present in most browsers for Standard stream ...
, but is independent of JavaScript and supported in other programming languages as well. JSON is an open format, standardized a
STD 90
()
ECMA-404
an

YAML YAML ( and ) (''see '') is a human-readable 220px, ISBN represented as EAN-13 bar code showing both human-readable and machine-readable data A human-readable medium or human-readable format is any encoding of data Data are units of info ...
, is a strict JSON superset and includes additional features such as a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting. YAML is an open format.
Property list In the macOS, IOS (Apple), iOS, NeXTSTEP, and GNUstep programming Software framework, frameworks, property list files are files that store serialization, serialized object (computer science), objects. Property list files use the filename extensio ...
s are used for serialization by
NeXTSTEP NeXTSTEP is a discontinued object-oriented Object-oriented programming (OOP) is a programming paradigm Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing ...
,
GNUstep GNUstep is a free software Free software (or libre software) is computer software Software is a collection of Instruction (computer science), instructions and data (computing), data that tell a computer how to work. This is in contrast ...

GNUstep
,
macOS macOS (; previously Mac OS X and later OS X) is a proprietary {{Short pages monitor