In the context of
IBM mainframe computer
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
s in the
S/360
The IBM System/360 (S/360) is a family of mainframe computer
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for t ...
line, a data set (IBM preferred) or dataset is a
computer file
A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and transfe ...
having a
record organization. Use of this term began with, e.g.,
DOS/360,
OS/360
OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB ...
, and is still used by their successors, including the current
z/OS
z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions.Starting with the earliest:
* O ...
. Documentation for these systems historically preferred this term rather than ''
file
File or filing may refer to:
Mechanical tools and processes
* File (tool), a tool used to ''remove'' fine amounts of material from a workpiece
**Filing (metalworking), a material removal process in manufacturing
** Nail file, a tool used to gent ...
''.
A data set is typically stored on a
direct access storage device
A direct-access storage device (DASD) (pronounced ) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, ...
(DASD) or
magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magne ...
, however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file).
Data sets are not unstructured streams of
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s, but rather are organized in various logical record and block structures determined by the
DSORG
(data set organization),
RECFM
(record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with
Job Control Language
Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem.
More specifically, the purpose of JCL is to say which programs to run, ...
DD
statements. Within a running program they are stored in the
Data Control Block In IBM mainframe operating systems, such as OS/360, MVS, z/OS, a Data Control Block (DCB) is a description of a dataset in a program. A DCB is coded in Assembler programs using the DCB macro instruction (which expands into a large number of "define ...
(DCB) or Access Control Block (ACB), which are data structures used to access data sets using
access methods.
Records in a data set may be fixed, variable, or “undefined” length.
Data set organization
For OS/360, the DCB's
DSORG
parameter specifies how the data set is organized. It may be
;CQ
:
Queued Telecommunications Access Method (QTAM) in Message Control Program (MCP)
;CX
:Communications line group
;DA
:
Basic Direct Access Method Basic Direct Access Method, or BDAM is an access method for IBM's OS/360 and successors computer operating systems on System/360 and later mainframes. BDAM "consists of routines used in retrieving data from, and storing data onto, direct access ...
(BDAM)
;GS
:Graphics device for Graphics Access Method(GAM)
;IS
:
Indexed Sequential Access Method
ISAM (an acronym for indexed sequential access method) is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more key (computing), keys. Indexes of key ...
(ISAM)
;MQ
:QTAM message queue in application
;PO
:Partitioned
;PS
:Physical Sequential
among others.
Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.
Programmers utilize various
access methods (such as
QSAM
In IBM mainframe operating systems, queued sequential access method (QSAM) is an access method to read and write datasets sequentially. QSAM is available on OS/360, OS/VS2, MVS, z/OS, and related operating systems.
QSAM is used both for devic ...
or
VSAM Virtual Storage Access Method (VSAM) is an IBM DASD file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and no ...
) in programs for reading and writing data sets. Access method depends on the given data set organization.
Record format (RECFM)
Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB
RECFM
parameter.
RECFM=F
means that the records are of fixed length, specified via the
LRECL
parameter.
RECFM=V
specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With
RECFM=FB
and
RECFM=VB
, multiple logical records are grouped together into a single
physical block on tape or DASD. FB and VB are
fixed-blocked
, and
variable-blocked
, respectively. RECFM=U (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field.
The
BLKSIZE
parameter specifies the maximum length of the block.
RECFM=FBS
could be also specified, meaning
fixed-blocked standard
, meaning all the blocks except the last one were required to be in full
BLKSIZE
length.
RECFM=VBS
, or
variable-blocked spanned
, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.
This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.
Partitioned data set
A partitioned data set (PDS)
is a data set containing multiple ''members'', each of which holds a separate sub-data set, similar to a
directory
Directory may refer to:
* Directory (computing), or folder, a file system structure in which to store computer files
* Directory (OpenVMS command)
* Directory service, a software application for organizing information about a computer network's u ...
in other types of
file system
In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s. This type of data set is often used to hold ''load modules'' (old format bound executable programs), source program libraries (especially Assembler macro definitions),
ISPF
In computing, Interactive System Productivity Facility (ISPF) is a software product for many historic IBM mainframe operating systems and currently the z/OS and z/VM operating systems that run on IBM mainframes. It includes a screen editor, the us ...
screen definitions, and
Job Control Language
Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem.
More specifically, the purpose of JCL is to say which programs to run, ...
. A PDS may be compared to a
Zip file or
COM Structured Storage
COM Structured Storage (variously also known as '' COM structured storage'' or '' OLE structured storage'') is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strict ...
.
A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks.
Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set.
Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression.
Compression, which is done using the
IEBCOPY
This article discusses support programs included in or available for OS/360 and successors. IBM categorizes some of these programs as utilities
and others as service aids; the boundaries are not always consistent or obvious. Many, but not all, of ...
utility,
moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called
defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contigu ...
or
garbage collection
Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclabl ...
;
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
nowadays refers to a different, more complicated concept.) PDS files can only reside on
DASD
A direct-access storage device (DASD) (pronounced ) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, ...
, not on
magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magne ...
, in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple
job control language
Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem.
More specifically, the purpose of JCL is to say which programs to run, ...
files,
utility
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosopher ...
control statements, and executable modules.
An improvement of this scheme is a
Partitioned Data Set Extended (PDSE or PDS/E, sometimes just ''libraries'') introduced with
DFSMSdfp for
MVS/XA
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated ...
and
MVS/ESA
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated ...
systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects.
PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space.
[ PDS/E files can only reside on DASD in order to use the directory structure to access individual members.
]
Generation Data Group
A Generation Data Group (''GDG'') is a group of non-VSAM data sets that are successive generations of historically-related data stored on an IBM mainframe (running OS or DOS/VSE).
A GDG is usually cataloged.[
An individual member of the GDG collection is called a "''Generation Data Set''."][ The latter may be identified by an absolute number, , or a relative number: (-1) for the previous generation, (0) for the current one, and (+1) the next generation.
]
GDG JCL & features
Generation Data Groups are defined using either the BLDG statement of the IEHPROGM utility or the DEFINE GENERATIONGROUP statement of the newer IDCAMS utility, which allows setting various parameters.
* would limit the number of generations limit to 10.
* would retain each member, up to the limited#generations, at least 91 days.
IDCAMS can also delete (and optionally uncatalog) a GDG.
References
Introduction to the New Mainframe: z/OS Basics
, Ch. 5, "Working with data sets", March 29, 2011.
{{DEFAULTSORT:Data Set (IBM Mainframe)
Data management
IBM mainframe operating systems
Computer file systems
Computer files