HOME

TheInfoList



OR:

In computer data, a substitute character (␚) is a
control character In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that does not represent a written symbol. They are used as in-band ...
that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s. In the
ASCII character set ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
, this character is encoded by the number 26 ( hex). Standard
keyboards Keyboard may refer to: Text input * Keyboard, part of a typewriter * Computer keyboard ** Keyboard layout, the software control of computer keyboards and their mapping ** Keyboard technology, computer keyboard hardware and firmware Music * Musi ...
transmit this code when the and keys are pressed simultaneously (, often documented by convention as '')''.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
inherits this character from ASCII, but recommends that the
replacement character Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0: *, marks start of annotated text *, marks start ...
(�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.


Uses


End of file

Historically, under
PDP-6 The PDP-6, short for Programmed Data Processor model 6, is a computer developed by Digital Equipment Corporation (DEC) during 1963 and first delivered in the summer of 1964. It was an expansion of DEC's existing 18-bit systems to use a 36-bit da ...
monitor,
RT-11 RT-11 (Real-time 11) is a discontinued small, low-end, single-user real-time operating system for the full line of Digital Equipment Corporation PDP-11 16-bit computers. RT-11 was first implemented in 1970. It was widely used for real-time computin ...
, VMS, and
TOPS-10 TOPS-10 System (''Timesharing / Total Operating System-10'') is a discontinued operating system from Digital Equipment Corporation (DEC) for the PDP-10 (or DECsystem-10) mainframe computer family. Launched in 1967, TOPS-10 evolved from the earlier ...
, and in early PC
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
1 and 2
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s (and derivatives like
MP/M MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each us ...
) it was necessary to explicitly mark the end of a file (EOF) because the native
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file. This extra space was filled with 16 ( hex) characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives like
Concurrent CP/M MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each us ...
,
Concurrent DOS Multiuser DOS is a Real-time operating system, real-time multi-user multi-tasking operating system for IBM Personal Computer, IBM PC-compatible microcomputers. An evolution of the older Concurrent CP/M-86, Concurrent DOS and Concurrent DOS 386 ...
, and
DOS Plus DOS Plus (erroneously also known as DOS+) was the first operating system developed by Digital Research's OEM Support Group in Newbury, Berkshire, UK, first released in 1985. DOS Plus 1.0 was based on CP/M-86 Plus combined with the PCMODE ...
) did support byte-granular files, so this was no longer a requirement, but it remained as a convention (especially for
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating ...
s) in order to ensure backward compatibility. In
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
,
86-DOS 86-DOS (known internally as QDOS, for Quick and Dirty Operating System) is a discontinued operating system developed and marketed by Seattle Computer Products (SCP) for its Intel 8086-based computer kit. 86-DOS shared a few of its commands wit ...
,
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few ope ...
, PC DOS,
DR-DOS DR-DOS (written as DR DOS, without a hyphen, in versions up to and including 6.0) is a disk operating system for IBM PC compatibles. Upon its introduction in 1988, it was the first DOS attempting to be compatible with IBM PC DOS and MS-D ...
, and their various derivatives, the SUB character was also used to indicate the end of a character stream, and thereby used to terminate user input in an interactive
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
window (and as such, often used to finish console input redirection, e.g. as instigated by the command COPY CON: TYPEDTXT.TXT). While no longer technically required to indicate the end of a file, as of 2017 many text editors and program languages still support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files. In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g. PNG or
GIF The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. ...
) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g.
CSV-1203 A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separato ...
) still recommend a trailing EOF character to be appended as the last character in the file. However, typing does not embed an EOF character into a file in either
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
or
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
, nor do the APIs of those systems use the character to denote the actual end of a file. Some programming languages (e.g.
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (cl ...
) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.), and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it. Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called " File Separator" has also been used for similar purposes.


Other uses

In
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
-like operating systems, this character is typically used in
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses ** Thin-shell structure Science Biology * Seashell, a hard o ...
s as a way for the user to suspend the currently executing interactive process. The suspended process can then be resumed in ''foreground'' (interactive) mode, or be made to resume execution in ''
background Background may refer to: Performing arts and stagecraft * Background actor * Background artist * Background light * Background music * Background story * Background vocals * ''Background'' (play), a 1950 play by Warren Chetham-Strode Reco ...
'' mode, or be terminated. When entered by a user at their
computer terminal A computer terminal is an electronic or electromechanical hardware device that can be used for entering data into, and transcribing data from, a computer or a computing system. The teletype was an example of an early-day hard-copy terminal and ...
, the currently running foreground process is sent a "terminal stop" ( SIGTSTP) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command ( fg) or the "
background Background may refer to: Performing arts and stagecraft * Background actor * Background artist * Background light * Background music * Background story * Background vocals * ''Background'' (play), a 1950 play by Warren Chetham-Strode Reco ...
" command ( bg). The Unicode Security Considerations report recommends this character as a safe replacement for unmappable characters during character set conversion. In many GUIs and applications, ( on
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
) can be used to
undo Undo is an interaction technique which is implemented in many computer programs. It erases the last change done to the document, reverting it to an older state. In some more advanced programs, such as graphic processing, undo will negate the las ...
the last action. In many applications, earlier actions than the last one can also be undone by pressing multiple times. was one of a handful of
keyboard Keyboard may refer to: Text input * Keyboard, part of a typewriter * Computer keyboard ** Keyboard layout, the software control of computer keyboards and their mapping ** Keyboard technology, computer keyboard hardware and firmware Music * Musi ...
sequences chosen by the program designers at
Xerox PARC PARC (Palo Alto Research Center; formerly Xerox PARC) is a research and development company in Palo Alto, California. Founded in 1969 by Jacob E. "Jack" Goldman, chief scientist of Xerox Corporation, the company was originally a division of Xero ...
to control text editing.


Representation

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
and
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
representation of "substitute": * Octal code: 32 * Decimal code: 26 * Hexadecimal code: 1A, U+001A * Mnemonic symbol: SUB * Binary value: 11010


See also

*
C0 and C1 control codes The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
(
ISO 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1 ...
) *
U+FFFD U or u, is the twenty-first and sixth-to-last letter and fifth vowel letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''u'' (pro ...
(Unicode replacement character �) *
Access key In a web browser, an access key or accesskey allows a computer user to immediately jump to a specific web page via the keyboard. They were introduced in 1999 and quickly achieved near-universal browser support. In the summer of 2002, a Canadian ...
*
Control-C Control+C is a common computer command. It is generated by pressing the key while holding down the key on most computer keyboards. In graphical user interface environments that use the control key to control the active program, control+C is o ...
*
Control-G A bell code (sometimes bell character) is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming me ...
*
Control-V In computing, Control-V is a key stroke with a variety of uses including generation of a control character in ASCII code, also known as the synchronous idle ( SYN) character. The key stroke is generated by pressing the key while holding down the ...
*
Control-X In computing, is the key combination of the control key and a key usually labeled "x" (lower-case letter ex), typically used to cut selected text and save it to the clipboard ready to paste elsewhere. Conventionally, the key combination is produ ...
* Control-\ *
Keyboard shortcut computing, a keyboard shortcut also known as hotkey is a series of one or several keys to quickly invoke a software program or perform a preprogrammed action. This action may be part of the standard functionality of the operating system or ...
* List of file signatures * , a symbol (sometimes called by the slang term ''tofu'') used to represent a missing character **
Noto fonts Noto is a font family comprising over 100 individual fonts, which are together designed to cover all the scripts encoded in the Unicode standard. , Noto fonts cover all 93 scripts defined in Unicode version 6.1 (April 2012), although fewer than ...
, a Google project to eliminate missing characters


References

{{reflist, refs= {{cite book , title=CP/M 2.0 Interface Guide , chapter=2. Operating System Call Conventions , date=1979 , edition=1 , publisher=
Digital Research Digital Research, Inc. (DR or DRI) was a company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser DOS, DOS Plus, DR DOS and ...
, location=Pacific Grove, California, USA , page=5 , url=http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf , access-date=2020-02-28 , url-status=live , archive-url=https://web.archive.org/web/20200228175812/http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf , archive-date=2020-02-28 , quote= ..The end of an
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
file is denoted by a
control-Z In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unreprese ...
character (1AH) or a real end of file, returned by the
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
read operation. Control-Z characters embedded within machine code files (e.g.,
COM file A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system (simi ...
s) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. ..} (56 pages)
{{cite book , title=Osborne CP/M User Guide - For All CP/M Users , chapter=3. CP/M Transient Commands , author-first=Thom , author-last=Hogan , publisher= A. Osborne/McGraw-Hill , date=1982 , edition=2 , location=Berkeley, California, USA , isbn=0-931988-82-9 , pag
74
, url=https://archive.org/details/osborne-cpm-users-guide_2nd-ed , access-date=2020-02-28 , quote= ..
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
marks the end of an
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
file by placing a
CONTROL-z In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unreprese ...
character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the
end-of-file marker In computing, end-of-file (EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. Details In the C standard library, the character reading funct ...
is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. ..

https://archive.org/download/osborne-cpm-users-guide_2nd-ed/OsborneCpmUsersGuideSecondEdition.pdf]
{{cite book , title=PDP-6 Multiprogramming System Manual , chapter=Table of IO Device Characteristics - Console or Teletypewriters , id=DEC-6-0-EX-SYS-UM-IP-PRE00 , publisher=
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president unt ...
(DEC) , publication-place=Maynard, Massachusetts, USA , date=1965 , page=43 , url=http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf , access-date=2014-07-10 , url-status=live , archive-url=https://web.archive.org/web/20140714140253/http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf , archive-date=2014-07-14 (1+84+10 pages)
{{cite book , title=PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors , volume=3 , chapter=5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line) , publisher=
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president unt ...
(DEC) , date=1969 , pages=5-3 – 5-6 -5 (431), url=http://bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf , access-date=2014-07-10 , url-status=live , archive-url=https://web.archive.org/web/20111115083418/http://www.bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf , archive-date=2011-11-15 (207 pages)
{{cite web , title=Keyboard shortcuts for Windows , work=Microsoft Support , publisher=
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
, url=http://support.microsoft.com/kb/126449 , access-date=2012-06-02
{{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 1.4 disc formats , url=http://www.seasip.info/Cpm/format14.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201114231913/http://www.seasip.info/Cpm/format14.html , archive-date=2020-11-14 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 2.2 disc formats , url=http://www.seasip.info/Cpm/format22.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201105204828/http://www.seasip.info/Cpm/format22.html , archive-date=2020-11-05 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 3.1 disc formats , url=http://www.seasip.info/Cpm/format31.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20211026154048/https://www.seasip.info/Cpm/format31.html , archive-date=2021-10-26 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 4.1 disc formats , url=http://www.seasip.info/Cpm/format41.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201105174304/http://www.seasip.info/Cpm/format41.html , archive-date=2020-11-05 {{cite web , title=Quick Reference: Unix Commands , work=IT Connect , publisher=
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seattle a ...
, url=http://www.washington.edu/computing/unix/unixqr.html , access-date=2012-06-02
CSV-1203 format specification
{{Webarchive, url=http://arquivo.pt/wayback/20160516100434/http://www.mastpoint.com/csv-1203 , date=2016-05-16
Unicode Security Considerations report
/ref>


Further reading

*
Federal Standard 1037C Federal Standard 1037C, titled Telecommunications: Glossary of Telecommunication Terms, is a United States Federal Standard issued by the General Services Administration pursuant to the Federal Property and Administrative Services Act of 1949, a ...
Control characters