PlainTalk
   HOME

TheInfoList



OR:

PlainTalk is the collective name for several
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
(MacinTalk) and
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
technologies developed by
Apple Inc. Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...
In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the
Macintosh Quadra The Macintosh Quadra is a family of personal computers designed, manufactured and sold by Apple Computer, Inc. from October 1991 to October 1995. The Quadra, named for the Motorola 68040 central processing unit, replaced the Macintosh II family as ...
series from 1993. It was made a standard system component in
System 7 System 7, codenamed "Big Bang", and also known as Mac OS 7, is a graphical user interface-based operating system for Macintosh computers and is part of the classic Mac OS series of operating systems. It was introduced on May 13, 1991, by Apple Co ...
.1.2, and has since been shipped on all
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
and some 68k Macintoshes.


Software


Speech synthesis


Technology

Apple's text-to-speech uses
diphone In phonetics, a diphone is an adjacent pair of phones in an utterance. For example, in aɪfəʊn the diphones are a ɪ ªf É™ ™ÊŠ Šn The term is usually used to refer to a recording of the transition between two phones. In the following d ...
s. Compared to other methods of synthesizing speech, it is not very resource-intensive, but limits how natural the
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
can be.
American English American English, sometimes called United States English or U.S. English, is the set of variety (linguistics), varieties of the English language native to the United States. English is the Languages of the United States, most widely spoken lan ...
and
Spanish Spanish might refer to: * Items from or related to Spain: **Spaniards are a nation and ethnic group indigenous to Spain **Spanish language, spoken in Spain and many Latin American countries **Spanish cuisine Other places * Spanish, Ontario, Can ...
versions have been available, but since the advent of Mac OS X, Apple has shipped only American English voices, relying on third-party suppliers such as Acapela Group to supply voices for other languages (in OS X 10.7, Apple licensed a lot of third-party voices and made them available for download within the Speech control panel). An
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
known as the Speech Manager enables third-party developers to use speech synthesis in their applications. There are various control sequences that can be used to fine-tune the intonation and rhythm. The
volume Volume is a measure of occupied three-dimensional space. It is often quantified numerically using SI derived units (such as the cubic metre and litre) or by various imperial or US customary units (such as the gallon, quart, cubic inch). The de ...
, pitch and rate of the speech can be configured as well, allowing for singing. Input to the synthesizer can be controlled explicitly using a specia
phoneme alphabet


Original MacinTalk

The initial Macintosh text-to-speech engine, MacinTalk (named by Denise Chandler), was used by Apple in the 1984 introduction of the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
in which the computer announced itself to the world (and poked fun at the weight of an IBM computer). While it was incorporated into the Macintosh's operating system, it was not officially supported by Apple (though programming information was made available through an Apple Technical Note). MacinTalk was developed by Joseph Katz and Mark Barton who later founde
SoftVoice, Inc.
which currently markets TTS engines for Windows, Linux and embedded platforms. MacinTalk used direct access to the original Macintosh sound hardware and all attempts to license the source code by Apple to update it for newer Macs failed.


MacinTalk 2

Eventually, Apple released a supported speech synthesis system, called MacinTalk 2. It supports any Macintosh running System Software 6.0.7 or later. It remained the recommended version for slower machines even after the release of MacinTalk 3 and Pro.


MacinTalk 3, Pro

MacinTalk 3 introduced a great variety of voices. Apart from the standard adult voices "Ralph", "Fred" and "Kathy", and children's voices like "Princess" and "Junior", various novelty voices were included, like "Whisper", "Zarvox" (a robotic voice with melodic background sounds, with a similar voice called "Trinoids" also included), "Cellos" (a voice that sang its text to an
Edvard Grieg Edvard Hagerup Grieg ( , ; 15 June 18434 September 1907) was a Norwegian composer and pianist. He is widely considered one of the foremost Romantic era composers, and his music is part of the standard classical repertoire worldwide. His use of ...
tune, otherwise known as Hall of the Mountain King and the HotDiggedyDemon outro, with similarly singing voices like "Good News", "Bad News", "Pipe Organ"), "Albert" (a hoarse-sounding voice), "Bells", "Boing", "Bubbles", and others. Each of these voices came with its own example text, that would be spoken when one hit the "Test" button in the Speech control panel. Some would just say their name, language and the version of MacinTalk they were introduced with. Others would say
funny Humour (Commonwealth English) or humor (American English) is the tendency of experiences to provoke laughter and provide amusement. The term derives from the humoral medicine of the ancient Greeks, which taught that the balance of fluids in ...
things, like "I sure like being inside this fancy computer", "I have a frog in my throat... No, I mean a real frog!", "We must rejoice in this morbid voice" (a
parody A parody, also known as a spoof, a satire, a send-up, a take-off, a lampoon, a play on (something), or a caricature, is a creative work designed to imitate, comment on, and/or mock its subject by means of satiric or ironic imitation. Often its subj ...
of Western church hymnody with organ music), or "The light you see at the end of the tunnel is the headlamp of a fast approaching train". These voices as well as their test texts are still in Mac OS X today. With the increase in computing power that the AV Macs and PowerPC based Macintoshes provided, Apple could afford to increase the quality of the synthesis. MacinTalk 3 required a 33
MHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...
68030 processor and MacinTalk Pro required a
68040 The Motorola 68040 ("''sixty-eight-oh-forty''") is a 32-bit microprocessor in the Motorola 68000 series, released in 1990. It is the successor to the 68030 and is followed by the 68060, skipping the 68050. In keeping with general Motorola nami ...
or better and at least 1 MB of
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
. Each synthesizer supported a different set of voices.


Text-to-speech in Mac OS X

Text-to-speech has been a part of every
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
(later macOS) version. The Victoria voice was enhanced significantly in
Mac OS X v10.3 Mac OS X Panther (version 10.3) is the fourth major release of macOS, Apple's desktop and server operating system. It followed Mac OS X Jaguar and preceded Mac OS X Tiger. It was released on October 24, 2003. System requirements Panther's sy ...
, and added as Vicki (Victoria was not removed). Its size was almost 20 times greater, because of the higher-quality diphone samples used. A new, much more natural-sounding voice, called "Alex" has been added to the Mac text-to-speech roster with the release of Mac OS X 10.5 Leopard. With
Mac OS X 10.7 OS X Lion, also known as Mac OS X Lion, (version 10.7) is the eighth major release of macOS, Apple's desktop and server operating system for Mac computers. A preview of OS X 10.7 Lion was publicly shown at the "Back to the Mac" Apple Speci ...
Lion, voices are available in additional U.S. English and other English accents, as well as 21 other languages. The ''Speak selected text when key is pressed'' feature allows selected text from any application to be read via a key combination. From Mac OS X 10.1 to
Mac OS X 10.6 Mac OS X Snow Leopard (version 10.6) is the seventh major release of macOS, Apple's desktop and server operating system for Macintosh computers. Snow Leopard was publicly unveiled on June 8, 2009 at Apple’s Worldwide Developers Conference. ...
, the feature would copy the selected text to the clipboard and read it from there. From
Mac OS X 10.7 OS X Lion, also known as Mac OS X Lion, (version 10.7) is the eighth major release of macOS, Apple's desktop and server operating system for Mac computers. A preview of OS X 10.7 Lion was publicly shown at the "Back to the Mac" Apple Speci ...
to
Mac OS X 10.10 OS X Yosemite ( ; version 10.10) is the eleventh major release of macOS, Apple Inc.'s desktop and server operating system for Macintosh computers. OS X Yosemite was announced and released to developers on June 2, 2014, at WWDC 2014 and rele ...
, a new implementation of the feature required software developers to implement a speech synthesis
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
into their applications. This prevented the clipboard from being overwritten, but also meant that, for applications that did not use the API, the feature would not function as expected, reading the title bar rather than the selected text. In macOS Sierra 10.12, Siri was introduced for the Mac, however, the voice was not available as a System Voice, which meant that the Siri voices could be only used in Siri. Siri was made available as a System voice in macOS Catalina 10.15, so that it would work for any text. The Siri voices work in a completely different way and the command remains unable to use Siri. In the macOS Big Sur 11.3 update, gender references to all voices were removed, coinciding with the change in Siri voices on iOS 14.5 and macOS 11.3 and later, as part of Apple's efforts to promote gender inclusivity.


Speech recognition

Apple hired many speech recognition researchers in 1990. After about a year, they demonstrated a technology codenamed Casper. It was released as part of the PlainTalk package in 1993. Although available for all PowerPC Macintoshes and AV 68k machines (it was one of the few applications that made use of the
DSP DSP may refer to: Computing * Digital signal processing, the mathematical manipulation of an information signal * Digital signal processor, a microprocessor designed for digital signal processing * Yamaha DSP-1, a proprietary digital signal ...
in the
Centris 660AV The Macintosh Quadra 660AV, originally sold as the Macintosh Centris 660AV, is a personal computer designed, manufactured and sold by Apple Computer from July 1993 to September 1994. It was introduced alongside the Quadra 840AV; the "AV" after ...
and
Quadra 840AV The Macintosh Quadra 840AV is a personal computer designed, manufactured, and sold by Apple Computer from July 1993 to July 1994. It was introduced alongside the Centris 660AV The Macintosh Quadra 660AV, originally sold as the Macintosh Ce ...
), it was not part of the default system install prior to Mac OS X, requiring the user to perform a custom OS installation to get speech recognition capabilities. In Mac OS X 10.7 Lion and earlier, Apple's speech recognition was voice-command oriented only, i.e. not intended for dictation. It can be configured to listen for commands when a hot key is pressed, after being addressed with an activation phrase such as "Computer", or "Macintosh", or without prompt. A graphical status monitor, often in the form of an animated character, provides visual and textual feedback about listening status, available commands and actions. It can also communicate back with the user using speech synthesis. Early versions of the speech recognition provided full access to the menus. This support was later removed, since it required too many resources and made recognition less reliable, only to be re-added in Mac OS X 10.3 as a "universal access technology" called spoken user interface. The user can launch items located in a special folder, called "Speakable Items", simply by speaking their name (while the system is in ''listening'' mode). Apple shipped a number of
AppleScript AppleScript is a scripting language created by Apple Inc. that facilitates automated control over scriptable Mac applications. First introduced in System 7, it is currently included in all versions of macOS as part of a package of system automa ...
s in this folder, but
aliases A pseudonym (; ) or alias () is a fictitious name that a person or group assumes for a particular purpose, which differs from their original or true name ( orthonym). This also differs from a new name that entirely or legally replaces an individu ...
,
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" or ...
s and folders can be opened in the same way. Additional functionality is provided by individual applications. An
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
lets programs define and modify an available
vocabulary A vocabulary is a set of familiar words within a person's language. A vocabulary, usually developed with age, serves as a useful and fundamental tool for communication and acquiring knowledge. Acquiring an extensive vocabulary is one of the la ...
. For example, the Finder provides a vocabulary for manipulating files and
windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
. In OS X 10.8 Mountain Lion, Apple introduced "Dictation," intended for general text. Originally, it required the sending of audio data to Apple servers for processing. In OS X 10.9 Mavericks, Apple added the option to download support for dictation without an Internet connection. As of OS X 10.9.3, eight languages (19 dialects) are supported.


Hardware

Apple produced two microphones under the product name "Apple PlainTalk Microphone". The first shipped inclusive with Macintosh LC and early Performa models, and was circular in appearance. It was designed to sit in a holder attached to the side of a
CRT display CRT or Crt may refer to: Science, technology, and mathematics Medicine and biology * Calreticulin, a protein * Capillary refill time, for blood to refill capillaries * Cardiac resynchronization therapy and CRT defibrillator (CRT-D) * Catheter- ...
, and be lifted out and held by the mouth when talking. The second model was introduced alongside the AV models in the
Macintosh Quadra The Macintosh Quadra is a family of personal computers designed, manufactured and sold by Apple Computer, Inc. from October 1991 to October 1995. The Quadra, named for the Motorola 68040 central processing unit, replaced the Macintosh II family as ...
series in 1993 but was also sold separately. It was designed to be positioned on top of the screen and to be sensitive to sound from the front. Both models had a longer connector, the tip of which was used to provide the microphone with
bias voltage In electronics, biasing is the setting of DC (direct current) operating conditions (current and voltage) of an active device in an amplifier. Many electronic devices, such as diodes, transistors and vacuum tubes, whose function is processing ...
.


References


External links


Folklore.org: The Original Macintosh, about the Macintosh introduction
* API Documentation: ** 10.14+ frameworks
Speech
(Recognition)
Speech Synthesis
(Part of AVFoundation) ** Cocoa API
NSSpeechSynthesizer
an
NSSpeechRecognizer
** Carbon API (ApplicationServices)
Speech Synthesis Manager
(the old diphone-based system with pitch control used by ) *

{{Speech synthesis Classic Mac OS-only software made by Apple Inc. Speech synthesis software Computer-related introductions in 1984