Mark Edward Davis (born September 13, 1952) is an American specialist in the
internationalization and localization
In computing, internationalization and localization (American) or internationalisation and localisation (British English), often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and ...
of software and the co-founder and president of the
Unicode Consortium
The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intentio ...
.
He is one of the key technical contributors to the
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
specifications, being the primary author or co-author of
bidirectional text
A bidirectional text contains two text directionalities, right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text direction in ea ...
algorithms (used worldwide to display
Arabic language
Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walte ...
and
Hebrew language
Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
text),
collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
(used by
sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
s and
search algorithm
In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
s),
Unicode normalization
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting s ...
,
Unicode scripts,
text segmentation
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in comp ...
,
identifiers,
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s,
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
,
character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
and
security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
.
Education
Davis was educated at
Stanford University where he was awarded a
PhD in
Philosophy in 1979.
Career and research
Davis has specialized in
Internationalization and localization
In computing, internationalization and localization (American) or internationalisation and localisation (British English), often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and ...
of software for many years. After his PhD, he worked in
Zurich,
Switzerland for several years, then returned to California to join
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, where its wild ancestor, ' ...
, where he co-authored the
Macintosh
The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
KanjiTalk KanjiTalk was the name given by Apple to its Japanese language localization of the classic Mac OS. It consisted of translated applications, a set of Japanese fonts, and a Japanese input method called Kotoeri. The software was sold and supported onl ...
and Script Manager, and authored the Macintosh Arabic and Hebrew systems. He also worked on parts of the
Mac OS
Two major famlies of Mac operating systems were developed by Apple Inc.
In 1984, Apple debuted the operating system that is now known as the "Classic" Mac OS with its release of the original Macintosh System Software. The system, rebranded "M ...
, including contributions to the design of
TrueType
TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating ...
. Later, he was the manager and architect for the
Taligent
Taligent Inc. (a portmanteau of "talent" and "intelligent") was an American software company. Based on the Pink object-oriented operating system conceived by Apple in 1988, Taligent Inc. was incorporated as an Apple/IBM partnership in 1992, and ...
international frameworks and was then the architect for a large part of the
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
international libraries.
At
IBM, he was the
Chief Software Globalization Architect. He is the author of a number of patents, primarily in
internationalization and localization
In computing, internationalization and localization (American) or internationalisation and localisation (British English), often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and ...
. At various times he has also managed groups or departments covering text, internationalization, operating system services, porting and technical communications.
Davis founded and was responsible for the overall architecture of
International Components for Unicode
International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...
(ICU: a major Unicode software internationalization library) and designed the core of the Java internationalization classes. He also is the vice-chair of the Unicode
Common Locale Data Repository
The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating syst ...
(CLDR) project, and is a co-author of
Best Current Practice
A Best Current Practice (BCP) is a ''de facto'' level of performance in engineering and information technology. It is more flexible than a standard, since techniques and tools are continually evolving.
The Internet Engineering Task Force publish ...
(BCP) 47
IETF language tag
An IETF BCP 47 language tag is a standardized code or tag that is used to identify human languages in the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in ''Best Current Practice (BCP) 47''; the s ...
Request for Comments
A Request for Comments (RFC) is a publication in a series from the principal technical development and standards-setting bodies for the Internet, most prominently the Internet Engineering Task Force (IETF). An RFC is authored by individuals or g ...
(RFC 4646 and RFC 5646), used for identifying languages in
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
and
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
documents.
Since the start of 2006, Davis has been working on software internationalization at
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
, focusing on effective and secure use of
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
(especially in the index and search pipeline), overall improvement and adoption of the software internationalization libraries (including ICU) and the introduction and maintenance of stable identifiers for languages, scripts, regions, time zones and currencies.
Publications
''The Unicode Standard, Version 5.0''
Personal life
Davis is married to Anne Gundelfinger.
He has two daughters from a previous marriage.
References
{{DEFAULTSORT:Davis, Mark
1952 births
American computer programmers
Apple Inc. employees
Google employees
Living people
People involved with Unicode