Nameprep
   HOME
*





Nameprep
Nameprep is the process of case-folding a string to lowercase and removal of some generally invisible code points before it is suitable to represent a domain name, or other such canonical name. It is used by the Internationalizing Domain Names in Applications (IDNA) standard, using the Unicode standard for NFKC normalization. Nameprep is defined in RFC 3491, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", as a profile of stringprep, which is described in RFC 3454, "Preparation of Internationalized Strings ("stringprep")." It does not map lookalike characters to a single character nor prohibit the use of lookalike characters. There are good reasons for this, such as the fact that same sets of characters may be lookalikes in some fonts but not in others, and the fact that any decision on which character to map to will obviously provide a bias towards users of one script; but it also has potentially grave implications for security if not considered by th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Internationalizing Domain Names In Applications
An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet, such as Arabic, Bengali, Chinese (Mandarin, simplified or traditional), Cyrillic (including Bulgarian, Russian, Serbian and Ukrainian), Devanagari, Greek, Hebrew, Hindi, Tamil or Thai or in the Latin alphabet-based characters with diacritics or ligatures, such as French, German, Italian, Polish, Portuguese or Spanish. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription. The DNS, which performs a lookup service to translate mostly user-friendly names into network addresses for locating Internet resources, is restricted in practice to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names. The internat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Internationalized Domain Name
An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet, such as Arabic, Bengali, Chinese (Mandarin, simplified or traditional), Cyrillic (including Bulgarian, Russian, Serbian and Ukrainian), Devanagari, Greek, Hebrew, Hindi, Tamil or Thai or in the Latin alphabet-based characters with diacritics or ligatures, such as French, German, Italian, Polish, Portuguese or Spanish. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription. The DNS, which performs a lookup service to translate mostly user-friendly names into network addresses for locating Internet resources, is restricted in practice to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names. The intern ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

String (computer Science)
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. ''String'' may also denote more general arrays or other sequence (or list) data types and structures. Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements. When a string appears literally in source code, it is known as a string literal or an anonymous string. In formal languages, which are used in mathematical ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Domain Name
A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As of 2017, 330.6 million domain names had been registered. Domain names are used in various networking contexts and for application-specific naming and addressing purposes. In general, a domain name identifies a network domain or an Internet Protocol (IP) resource, such as a personal computer used to access the Internet, or a server computer. Domain names are formed by the rules and procedures of the Domain Name System (DNS). Any name registered in the DNS is a domain name. Domain names are organized in subordinate levels (subdomains) of the DNS root domain, which is nameless. The first-level set of domain names are the top-level domains (TLDs), including the generic top-level domains (gTLDs), such as the prominent domains com, info, net ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic script (Unicode), scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, and most modern programming languages. The Unicode character repertoire is synchronized with Universal Coded Character Set, ISO/IEC 10646, each being code-for-code id ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Unicode Normalization
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter " ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Sim ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Homoglyph
In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar. The designation is also applied to sequences of characters sharing these properties. Synoglyphs are glyphs that look different but mean the same thing. Synoglyphs are also known informally as ''display variants''. The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters. In 2008, the Unicode Consortium published its Technical Report #36 on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts. An example of homoglyphic confusion in a historical regard results from the use of a 'y' to represent a 'þ' when setting older English texts in typefaces that do not contain the latter character. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Internationalization
In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strategy not only for companies that seek horizontal integration globally but also for countries that addresses the sustainability of its development in different manufacturing as well as service sectors especially in higher education which is a very important context that needs internationalization to bridge the gap between different cultures and countries. There are several internationalization theories which try to explain why there are international activities. Entrepreneurs and enterprises Those entrepreneurs who are interested in the field of internationalization of business need to possess the ability to think globally and have an understanding of international cultures. By appreciating and understanding different beliefs, values, behavio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


International Components For Unicode
International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies. ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); multiple calendars and time zones; and rule-based formatting and parsing of dates ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

IDN Homograph Attack
The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike (i.e., they are homographs, hence the term for the attack, although technically homoglyph is the more accurate term for different characters that look alike). For example, a regular user of example.com may be lured to click a link where the Latin character "a" is replaced with the Cyrillic character "а". This kind of spoofing attack is also known as script spoofing. Unicode incorporates numerous writing systems, and, for a number of reasons, similar-looking characters such as Greek Ο, Latin O, and Cyrillic О were not assigned the same code. Their incorrect or malicious usage is a possibility for security attacks.
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]