ZX Spectrum character set
   HOME

TheInfoList



OR:

The ZX Spectrum character set is the variant of
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
used in the
ZX Spectrum The ZX Spectrum () is an 8-bit home computer that was developed by Sinclair Research. It was released in the United Kingdom on 23 April 1982, and became Britain's best-selling microcomputer. Referred to during development as the ''ZX81 Colou ...
family computers. It is based on
ASCII-1967 ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
but the characters ^, ` and
DEL Del, or nabla, is an operator used in mathematics (particularly in vector calculus) as a vector differential operator, usually represented by the nabla symbol ∇. When applied to a function defined on a one-dimensional domain, it denotes ...
are replaced with ↑, £ and ©. It also differs in its use of the
C0 control codes The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
other than the common BS and CR, and it makes use of the 128 high-bit characters beyond the ASCII range.ZX Spectrum manual, Appendix A, the character set The ZX Spectrum's main set of printable characters and system font are also used by the
Jupiter Ace The Jupiter Ace by Jupiter Cantab was a British home computer of the early 1980s. The Ace differed from other microcomputers of the time in that its programming environment used Forth instead of the more popular BASIC. After Jupiter Cantab c ...
computer.


Printable characters

Standard US-ASCII, 0x20–0x7F, is included in the Spectrum character set except that code point 0x5E is an up-arrow (↑) instead of a
caret Caret is the name used familiarly for the character , provided on most QWERTY keyboards by typing . The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreade ...
(^), 0x60 is the
pound sign The pound sign is the symbol for the pound unit of sterling – the currency of the United Kingdom and previously of Great Britain and of the Kingdom of England. The same symbol is used for other currencies called pound, such as the Gibralt ...
(£) instead of the grave accent (`), and 0x7F is the
copyright sign The copyright symbol, or copyright sign, (a circled capital letter C for copyright), is the symbol used in copyright notices for works other than sound recordings. 17 U.S.C. The use of the symbol is described by the Universal Copyright Conv ...
(©) instead of the
control character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
DEL Del, or nabla, is an operator used in mathematics (particularly in vector calculus) as a vector differential operator, usually represented by the nabla symbol ∇. When applied to a function defined on a one-dimensional domain, it denotes ...
. Note that the use of 0x5E as ↑ was also the case in the older 1963 version of ASCII. The £ sign was not mapped to 0x23 as in the British variant of ASCII ( ISO-646-GB), allowing both the pound sign and the
number sign The symbol is known variously in English-speaking regions as the number sign, hash, or pound sign. The symbol has historically been used for a wide range of purposes including the designation of an ordinal number and as a ligatured abbreviati ...
(#) simultaneously. The ↑ character is the
exponentiation Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to r ...
operator in Spectrum's BASIC, just like the ^ it replaces compared to ASCII-1967 is used for exponentiation in many other dialects of BASIC and other programming languages. Beyond 0x7F, the Spectrum character set uses the high-bit range 0x80–0xFF for special purposes. 0x80–0x8F contain the same 2×2 block graphics characters that the
ZX80 character set The ZX80 character set is the character encoding used by the Sinclair Research ZX80 microcomputer with its original 4K BASIC ROM. The encoding uses one byte per character for 256 code points. It has no relationship with previously established one ...
and the
ZX81 character set The ZX81 character set is the character encoding used by the Sinclair Research ZX81 family of microcomputers including the Timex Sinclair 1000 and Timex Sinclair 1500. The encoding uses one byte per character for 256 code points. It has no rel ...
have (at other locations), also available in the
Block Elements Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of th ...
Unicode block. However the ZX Spectrum's standard character set does not include the ZX80/81 50%
dither Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images. Dither is routinely used in processing of both digital audio and video data, and is often ...
ed 1×2 block graphics characters. Code points 0x90–0xA4 contain the originally 21 User-Defined Graphics (UDG) characters, and 0xA5–0xFF contain BASIC keywords tokenized as single code points. In the 128 BASIC mode introduced later, this was changed to 19 UDG characters ending at 0xA2 followed by the two new tokens SPECTRUM and PLAY. Code points 0xC7–0xC9 are the two-character operators <=, >= and <>, similarly tokenized into single code points. These tokens allow a BASIC command like PRINT to be entered with the single keypress at the beginning of a line (i.e. in command mode), which generated 0xF6. That is displayed as the full keyword PRINT on screen but only a single byte token is stored so only that single byte need be parsed by the interpreter or saved to/loaded from external storage such as tape. All non-UDG Spectrum characters can be mapped to
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
. The three non ASCII-1967 characters ↑, £ and © are at U+2191, U+00A3 and U+00A9. The 2×2 block graphics characters are in the
Block Elements Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of th ...
block at U+2580–U+259F although font support for the latter is not universal. The shape of the UDG characters is mapped to a
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
memory area and is initialized to copies of characters A-U, but can be redefined arbitrarily for example using the BASIC command
POKE Poke may refer to: Arts, entertainment, and media * Poke (''Ender's Game''), a fictional character * Poke (game), a two-player card game * Poke, a fictional bar owner in the television series '' Treme'' * The Poke, a British satirical website Fo ...
. Like all characters in the system font they use an 8×8 pixel grid stored in 8 bytes. Redefining them changes their appearance in subsequent PRINT statements but it does not change any UDG characters already drawn on the screen. The location of a UDG character's definition can be determined with the BASIC function USR with the character as the argument, e.g. USR "A" for the first one. By default this points to the last 168 (21×8) bytes of RAM at memory addresses 65368 (0xFF58) to 65535 (0xFFFF) for a 48K Spectrum. The location is pointed to by the system variable UDG which can be found at memory address 23675/6 (0x5C7B/C) and can be changed. The
TK90X The TK90X was the first Brazilian ZX Spectrum clone made in 1985 by Microdigital Electrônica, a company located in São Paulo, Brazil, that had manufactured some ZX81 clones (TK82C, TK83 and TK85) and ZX80 clones (TK80, TK82) before. Technic ...
, a Brazilian clone of the ZX Spectrum included an in ROM application to graphically edit these UDG characters, along with functionality to preload then with accented letters used in Portuguese. (For this, the TK90X defined two extra Basic commands at the codes 0 and 1, respectively "trace" and "udg") The definition of the main system font, 32 (space) to 127 (copyright), are referenced by the system variable CHARS which can be found at memory address 23606/7 (0x5C36/7). It is defined as 256 bytes lower than the first byte of the space character, simplifying the formula for locating a character to CHARS+8×''code point''. The CHARS value defaults to the value 15360 (0x3C00), with the system font at the end of the Spectrum's ROM at address 15616 (0x3D00) to 16383 (0x3FFF). Entire alternative fonts can be loaded into RAM and the CHARS variable re-pointed accordingly.ZX Spectrum manual, Chapter 25, the system variables


Control codes

In the control codes area (the C0 range), the Spectrum mostly uses proprietary controls, such as INK and PAPER to control foreground and background colour. However, the common BS and CR code points are the same as in ASCII. Cursor-down (0x0A, ASCII Line Feed) can be simulated with 32 spaces printed with OVER 1 (transparent overprint) and cursor-up 0x0B (ASCII Vertical Tabulation) can be simulated with 32 backspaces. The system ROM has a fault which prevents cursor-right at 0x09 (c.f. ASCII Horizontal Tabulation) from working. Control code 0x0E is used to indicate that a floating-point number follows, to accelerate text processing. In a Sinclair BASIC program numeric constants are stored as ASCII followed by a 0x0E byte and a 5-byte binary floating point representation. When listing a BASIC program only the ASCII part is used but at runtime only the binary representation is used. Some Spectrum programs exploited this to obfuscate numbers, while others did so to save memory. For example, a BASIC line displayed as GO TO 10 could contain the ASCII characters for digits 1 and 0 followed by a 0x0E byte and the floating-point representation of 100 instead of 10. Anyone listing that program saw the number 10, but when executed the program jumped to line 100.


Undefined codes

Ranges 0x00–0x05, 0x07, 0x0A–0x0C, 0x0F and 0x17–0x1F are undefined. In most cases, they will produce a question mark if printed to the display. However, they may be used to represent their literal numeric values in conjunction with certain control codes: for example, 0x10 + 0x07 sets the ink (foreground text) colour to colour number 7 (white).


Character set


See also

*
ZX80 character set The ZX80 character set is the character encoding used by the Sinclair Research ZX80 microcomputer with its original 4K BASIC ROM. The encoding uses one byte per character for 256 code points. It has no relationship with previously established one ...
*
ZX81 character set The ZX81 character set is the character encoding used by the Sinclair Research ZX81 family of microcomputers including the Timex Sinclair 1000 and Timex Sinclair 1500. The encoding uses one byte per character for 256 code points. It has no rel ...
*
PETSCII PETSCII (''PET Standard Code of Information Interchange''), also known as CBM ASCII, is the character set used in Commodore Business Machines (CBM)'s 8-bit home computers, starting with the PET from 1977 and including the C16, C64, C116, C1 ...
*
ATASCII The ATASCII character set, from ''ATARI Standard Code for Information Interchange'', alternatively ''ATARI ASCII'', is the variation on ASCII used in the Atari 8-bit family of home computers. The first of this family are the Atari 400 and 800, ...
*
Atari ST character set The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC, and like that set includes ASCII codes 32 ...
* Extended ASCII


Notes


References


''Sinclair Basic Manual'', Steven Vickers, Robin Bradbeer (ed.); pub. Sinclair Research Limited. Online copy at World of Spectrum


External links



From Michael Zaretski's website
Mapping table from Sinclair Spectrum+ 48K Character Set to Unicode
From the same site

{{character encoding Character sets
Character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
Computer-related introductions in 1982