Xena (software)
   HOME

TheInfoList



OR:

Xena is
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
for use in
digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
. Xena is short for XML Electronic Normalising for Archives. Xena is a
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
application that was developed by the
National Archives of Australia The National Archives of Australia (NAA), formerly known as the Commonwealth Archives Office and Australian Archives, is an Australian Government agency that serves as the national archives of the nation. It collects, preserves and encourages ...
. It is available free of charge under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
. Version 6.1.0 was released 31 July 2013. Source code and binaries for Linux, OS X and Windows are available from
SourceForge SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrorin ...
. However, as of 2018, it is no longer maintained or supported.


Mode of operation

Xena attempts to avoid digital obsolescence by converting files into an openly specified format, such as
ODF The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
or PNG. If the file format is not supported or the Binary Normalisation option is selected, Xena will perform
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
encoding on binary files and wrap the output in XML metadata. The resulting .xena file is plain text, although the content of the data itself is not directly human-readable. The exact original file can be retrieved by stripping the metadata and reversing the Base64 encoding, using an internal viewer.


Features

Platforms supported by Xena are
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
,
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
. Xena uses a series of plugins to identify file formats and convert them to an appropriate openly specified format. Xena has an
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
which allows any reasonably skilled Java developer to develop a plugin to cover a new file type. Xena can process individual files or whole directories. When processing a whole directory, it can preserve the original directory structure of the converted records. Xena can create plain text versions of file formats such as
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
,
Word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
and
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, with the use of
Tesseract (software) Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open sou ...
. The Xena interface or Xena Viewer can be used to view or export a Xena file (extension .xena) in its target file format. These files contain the normalised file as well as any extra information relevant to the normalisation process. The Xena Viewer supports bulk export of Xena files to target file formats. Xena can be used via its
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
or the
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
. For Xena to be fully functional, it requires a local installation of the following external software: *
LibreOffice LibreOffice () is a free and open-source productivity software, office productivity software suite, a project of The Document Foundation (TDF). It was fork (software development), forked in 2010 from OpenOffice.org, an open-sourced version of t ...
suite - to convert office documents to OpenDocument format *
Tesseract In geometry, a tesseract is the four-dimensional analogue of the cube; the tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of six square faces, the hypersurface of the tesseract consists of eig ...
- to create plain text versions of file formats *
ImageMagick ImageMagick, invoked from the command line as magick, is a free and open-source cross-platform software suite for displaying, creating, converting, modifying, and editing raster images. Created in 1987 by John Cristy, it can read and write ove ...
- to convert a subset of image files to PNG *Readpst - to convert
Microsoft Outlook Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites. Though primarily an email client, Outlook also includes such functions as Calen ...
PST files to XML. Readpst is part of the free and open sourc
libpst software suite
*
FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...
- to convert audio files to FLAC format. This is also required to play back audio files using Xena.


Supported file types

Xena will recognize and process the file types listed below, plus a few others of minor importance. Unsupported file types will automatically undergo binary normalization. Office file formats: *
Microsoft Office Microsoft Office, or simply Office, is the former name of a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas. Initially a marketin ...
files (including MS Office XML, SYLK spreadsheets and
Rich Text Format ) As an example, the following RTF code would be rendered as follows: This is some bold text. Character encoding A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. Th ...
) are converted to the corresponding OpenDocument files *
Microsoft Outlook Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites. Though primarily an email client, Outlook also includes such functions as Calen ...
PST files are parsed for their individual messages, which are converted to XML files and a Xena index file is created *
Microsoft Project Microsoft Project is a project management software product, developed and sold by Microsoft. It is designed to assist a project manager in developing a schedule, assigning resources to tasks, tracking progress, managing the budget, and anal ...
MPP files are converted to XML *
OpenOffice.org XML OpenOffice.org XML is an open XML-based file format developed as an open community effort by Sun Microsystems in 2000–2002. The open-source software application suite OpenOffice.org 1.x and StarOffice 6 and 7 used the format as their native an ...
files (SXC, SXI, SXW) are converted to the corresponding OpenDocument formats * WordPerfect WPD files are converted to OpenDocument ODT *
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
documents (ODT, ODS, ODB, ODP) are preserved unchanged *Acrobat PDF files are stored as binaries *Mailbox files (MBX) are converted to individual XML files Graphics: * BMP,
GIF The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. ...
,
PSD PSD may refer to: Educational bodies * Pennsylvania School for the Deaf, a Pre-K to 12th grade school for Deaf and Hard of Hearing students, located in the Germantown section of Philadelphia, Pennsylvania * Philippine School Doha, a Filipino scho ...
,
PCX PCX, standing for ''PiCture eXchange'', was an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia, United States. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS ...
,
RAS Ras or RAS may refer to: Arts and media * RAS Records Real Authentic Sound, a reggae record label * Rundfunk Anstalt Südtirol, a south Tyrolese public broadcasting service * Rás 1, an Icelandic radio station * Rás 2, an Icelandic radio stati ...
, and the
X Window System The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems. X provides the basic framework for a GUI environment: drawing and moving windows on the display device and interacting wit ...
XBM In computer graphics, the X Window System used X BitMap (XBM), a plain text binary image format, for storing cursor and icon bitmaps used in the X GUI. The XBM format is superseded by XPM, which first appeared for X11 in 1989. Format XBM fil ...
and XPM bitmap files are converted to PNG;
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
files additionally get embedded metadata stored in Xena XML. If the
Tesseract In geometry, a tesseract is the four-dimensional analogue of the cube; the tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of six square faces, the hypersurface of the tesseract consists of eig ...
OCR software is installed, text will be extracted from TIFF files. *OpenDocument Drawings (ODG) and SVG files are wrapped in Xena XML *JPG and PNG files are stored unchanged Archive Files: *Files are extracted from
archives An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or ...
( ZIP,
GZIP gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
, TAR/TAR.gz,
JAR A jar is a rigid, cylindrical or slightly conical container, typically made of glass, ceramic, or plastic, with a wide mouth or opening that can be closed with a lid, screw cap, lug cap, cork stopper, roll-on cap, crimp-on cap, press-on c ...
,
WAR War is an intense armed conflict between states, governments, societies, or paramilitary groups such as mercenaries, insurgents, and militias. It is generally characterized by extreme violence, destruction, and mortality, using regular o ...
, Mac binary) and normalised into a separate Xena file. A Xena index file is created, which when opened in the internal Xena viewer will display the files in a table. Audio files: *
MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
,
WAV Waveform Audio File Format (WAVE, or WAV due to its filename extension; pronounced "wave") is an audio file format standard, developed by IBM and Microsoft, for storing an audio bitstream on PCs. It is the main format used on Microsoft Wind ...
,
AIFF Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange File ...
, and
OGG Ogg is a free, open container format maintained by the Xiph.Org Foundation. The authors of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality di ...
formats are converted to
FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...
files. Databases: * SQL files are processed as plain text wrapped in XML Other file types: *HTML is converted to XHTML *TXT text files are stored as plain text wrapped in XML; CSS files are stored as plain text wrapped in XML


Reviews

An April 22, 2010 review in Practical e-Records rated Xena at 82/100 points. At present Xena has no target preservation format for video files.


References


External links


Xena on SourceForgeXena wiki on SourceForgeXena project description at The Australian Service for Knowledge of Open Source SoftwareNational Archives of Australia - software
{{DEFAULTSORT:Xena (Software) Digital preservation Electronic documents Free software programmed in Java (programming language) Binary-to-text encoding formats Mass digitization Software using the GPL license