HOME

TheInfoList



OR:

{{noref, date=January 2019 The term round-trip is used in
document conversion Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires tha ...
particularly involving markup languages such as
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
and SGML. A successful round-trip consists of converting a document in format A (docA) to one in format B (docB) and then back again to format A (docA′). If docA and docA′ are identical then there has been no information loss and the round-trip has been successful. More generally it means converting from any data representation and back again, including from one data structure to another.


Information loss

When a document in one format is converted to another there is likely to be information loss. For example, suppose an
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
document is saved as
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
(*.txt). Then all the markup (structure, formatting, superscripts, …) will be lost. Compound documents will frequently lose information on images and other embedded objects. If the text file is converted back to the original format, information will necessarily be missing. A similar effect happens with image formats. Some formats such as JPEG achieve compression through small amount of information loss. If a lossless file, such as a BMP or PNG file, is converted to JPEG and back again then the result will be different from the original (although it may be visually very similar). Just because the initial and final documents are not bitwise identical does not mean there is information loss. Some formats have undefined fields, or fields where the contents have no impact on the result.


Markup languages

Markup languages such as XML can, in principle, hold any information and so the process docA → docX → docA' could be designed to avoid information loss. It is now common to convert legacy formats to XML formats because they have greater interoperability and a wider set of available tools. Thus it is possible to convert Word documents to an XML format and reimport them. The XML document should contain identical information to the legacy format. An important condition is that the roundtrip (legacy → XML → legacy') should result in effectively identical documents. Because some document structures allow some flexibility in content order, whitespace, case-sensitivity, etc. it is useful to have a means of canonicalizing the legacy format. The full roundtrip may then be: :legacy → canonicalLegacy → XML → legacy′ → canonicalLegacy′ If canonicalLegacy = canonicalLegacy′ then the roundtrip has been successful.


Character encodings

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
has a principle to have round-trip compatibility with older standardized legacy encodings, so conversion of documents to Unicode do not lose information; they can be converted back. To achieve this,
Unicode compatibility characters In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says: A character that would not have been encoded excep ...
have been introduced.


Limitation

An application can claim to round-trip and be dishonest. For example, it may save the original data from docA as a field in docX, so the reverse transformation to docA′ simply extracts that field. While this may be needed for some cases, the idea of a round-trip conversion is to go through another format representation or data structure and back again. Such a strategy means that small changes in a document means that it can not be converted back to the original format.


Usage

The term appears to be common, but not reported in dictionaries. A typical usage occurs i

but the term is likely to have been used before this.


See also

* Lossy data conversion *
Mojibake Mojibake ( ja, 文字化け; , "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, oft ...
Markup languages File conversion software