Universal Character Set

	Universal Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented writing systems are added. The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536, which is the Basic Multilingual Plane (BMP), had entered into common use before 2000. This situation began changing when the People's Republic of China (PRC) ruled in 2006 that all software sold in its jurisdiction would have to support GB 18030. This required software intended for sale in the PRC to move beyond the BMP. The system deliberately leaves many code points not assigned to characters, even in the BMP. It does this to allow for future expansion or to minimise conflicts with other encoding forms. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Character (computing), characters and 168 script (Unicode), scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with Univers ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	People's Republic Of China China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after India, representing 17.4% of the world population. China spans the equivalent of five time zones and Borders of China, borders fourteen countries by land across an area of nearly , making it the list of countries and dependencies by area, third-largest country by land area. The country is divided into 33 Province-level divisions of China, province-level divisions: 22 provinces of China, provinces, 5 autonomous regions of China, autonomous regions, 4 direct-administered municipalities of China, municipalities, and 2 semi-autonomous special administrative regions. Beijing is the country's capital, while Shanghai is List of cities in China by population, its most populous city by urban area and largest financial center. Considered one of six ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books. Collation differs from ''classification'' in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class. Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order). A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character s ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Character Repertoire Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a character encoding are known as code points and collectively comprise a code space or a code page. Early character encodings that originated with optical or electrical telegraphy and in early computers could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. Over time, character encodings capable of representing more characters were created, such as ASCII, the ISO/IEC 8859 encodings, various computer vendor encodings, and Unicode encodings such as UTF-8 and UTF-16. The most popular character encoding on the World Wide Web is UTF-8, which is used in 98.2% of surveyed web sites, as of May 2024. In application programs and operating system t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control characters a total of 128 code points. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers; for example, the first 128 code points of Unicode are the same as ASCII. ASCII encodes each code-point as a value from 0 to 127 storable as a seven-bit integer. Ninety-five code-points are printable, including digits ''0'' to ''9'', lowercase letters ''a'' to ''z'', uppercase letters ''A'' to ''Z'', and commonly used punctuation symbols. For example, the letter is represented as 105 (decimal). Also, ASCII specifies 33 non-printing control codes which originated with ; most of which are now obsolete. The control cha ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Plan 9 From Bell Labs Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has been free and open-source. The final official release was in early 2015. Under Plan 9, UNIX's '' everything is a file'' metaphor is extended via a pervasive network-centric filesystem, and the cursor-addressed, terminal-based I/O at the heart of UNIX-like operating systems is replaced by a windowing system and graphical user interface without cursor addressing, although rc, the Plan 9 shell, is text-based. The name ''Plan 9 from Bell Labs'' is a reference to the Ed Wood 1957 cult science fiction Z-movie '' Plan 9 from Outer Space''. The system continues to be used and developed by operating system researchers and hobbyists. History Plan 9 from Bell Labs was originally developed, starting in the late 1980s, by members of the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Ken Thompson (computer Programmer) Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programming language, the direct predecessor to the C language, and was one of the creators and early developers of the Plan 9 operating system. Since 2006, Thompson has worked at Google, where he co-developed the Go language. A recipient of the Turing award, he is considered one of the greatest computer programmers of all time. Other notable contributions included his work on regular expressions and early computer text editors QED and ed, the definition of the UTF-8 encoding, and his work on computer chess that included the creation of endgame tablebases and the chess machine Belle. He won the Turing Award in 1983 with his long-term colleague Dennis Ritchie. Early life and education Thompson was born in New Orleans, Louisiana. When asked ho ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Rob Pike Robert Pike (born 1956) is a Canadian programmer and author. He is best known for his work on the Go programming language while working at Google and the Plan 9 operating system while working at Bell Labs, where he was a member of the Unix team. Pike wrote the first window system for Unix in 1981. He is the sole inventor named in the US patent for overlapping windows on a computer display. With Brian Kernighan, he is the co-author of '' The Practice of Programming'' and '' The Unix Programming Environment''. With Ken Thompson, he is the co-creator of UTF-8 character encoding. Additional works While at Bell Labs, Pike was also involved in the creation of the Blit graphical terminal for Unix, the Inferno operating system, and the Limbo programming language. Pike also developed lesser systems such as the Newsqueak Newsqueak is a concurrent programming language for writing application software with interactive graphical user interfaces. Newsqueak's syntax and se ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Unicode Plane In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4 bytes. The 17 planes can accommodate 1,114,1 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	UTF-1 UTF-1 is an obsolete method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8. Design Similar to UTF-8, UTF-1 is a variable-width encoding that is backwards-compatible with ASCII. Every Unicode code point is represented by either a single byte, or a sequence of two, three, or ''five'' bytes. All ASCII code points are a single byte (the code points through are also single bytes). UTF-1 does not use the C0 and C1 control codes or the space character in multi-byte en ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9 and "A"–"F" to represent values from ten to fifteen. Software developers and system designers widely use hexadecimal numbers because they provide a convenient representation of binary code, binary-coded values. Each hexadecimal digit represents four bits (binary digits), also known as a nibble (or nybble). For example, an 8-bit byte is two hexadecimal digits and its value can be written as to in hexadecimal. In mathematics, a subscript is typically used to specify the base. For example, the decimal value would be expressed in hexadecimal as . In programming, several notations denote hexadecimal numbers, usually involving a prefi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	C0 And C1 Control Codes The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received. C0 codes are the range 00 HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used. C0 controls ASCII defines 32 control characters, plus the DEL character. This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]