HOME

TheInfoList



OR:

15.ai is a
non-commercial A non-commercial (also spelled noncommercial) activity is an activity that does not, in some sense, involve commerce, at least relative to similar activities that do have a commercial objective or emphasis. For example, advertising-free community ...
freeware Freeware is software, most often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the f ...
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
web application A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serve ...
that generates natural emotive high-fidelity
text-to-speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
voices from an assortment of fictional characters from a variety of media sources. Developed by an anonymous
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the mo ...
researcher under the eponymous
pseudonym A pseudonym (; ) or alias () is a fictitious name that a person or group assumes for a particular purpose, which differs from their original or true name (orthonym). This also differs from a new name that entirely or legally replaces an individua ...
15, the project uses a combination of
audio synthesis A synthesizer (also spelled synthesiser) is an electronic musical instrument that generates audio signals. Synthesizers typically create sounds by generating Waveform, waveforms through methods including subtractive synthesis, additive synth ...
algorithms,
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
deep neural networks Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...
, and
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
models to generate and serve emotive character voices faster than real-time, even those with a very small amount of data. Launched in early 2020, 15.ai began as a
proof of concept Proof of concept (POC or PoC), also known as proof of principle, is a realization of a certain method or idea in order to demonstrate its feasibility, or a demonstration in principle with the aim of verifying that some concept or theory has prac ...
of the
democratization Democratization, or democratisation, is the transition to a more democratic political regime, including substantive political changes moving in a democratic direction. It may be a hybrid regime in transition from an authoritarian regime to a ful ...
of voice acting and dubbing using technology. Its gratis and non-commercial nature (with the only stipulation being that the project be properly credited when used), ease of use, and substantial improvements to current text-to-speech implementations have been lauded by users; however, some critics and
voice actor Voice acting is the art of performing voice-overs to present a character or provide information to an audience. Performers are called voice actors/actresses, voice artists, dubbing artists, voice talent, voice-over artists, or voice-over talent ...
s have questioned the
legality Legality, in respect of an act, agreement, or contract is the state of being consistent with the law or of being lawful or unlawful in a given jurisdiction, and the construct of power. According to the Merriam-Webster Dictionary, legality is 1 : ...
and ethicality of leaving such technology publicly available and readily accessible. Credited as the impetus behind the popularization of AI vocal reconstruction technology in
content creation Content creation is the contribution of information to any Content (media), media and most especially to digital content, digital media for an end-user/audience in specific contexts. Content is "something that is to be expressed through some Medi ...
, 15.ai has had a significant impact on multiple Internet fandoms, most notably the ''My Little Pony: Friendship Is Magic'', ''
Team Fortress 2 ''Team Fortress 2'' is a 2007 multiplayer first-person shooter, first-person shooter game developed and published by Valve Corporation. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video game), Quake'' and ...
'', and ''
SpongeBob SquarePants ''SpongeBob SquarePants'' (or simply ''SpongeBob'') is an American animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It chronicles the adventures of the title character a ...
'' fandoms. Several commercial alternatives have spawned with the rising popularity of 15.ai, leading to cases of misattribution and theft. In January 2022, it was discovered that Voiceverse NFT, a company that voice actor
Troy Baker Troy Baker (born April 1, 1976) is an American voice actor and musician. Baker is known for his video game roles, including Joel Miller in ''The Last of Us'' (2013) and its sequel (2020), Booker DeWitt in ''BioShock Infinite'' (2013), Samuel ...
announced his partnership with, had plagiarized 15.ai's work as part of their platform.


Features

Available characters include
GLaDOS GLaDOS (Genetic Lifeform and Disk Operating System) is a fictional artificial intelligence, artificially superintelligent computer, computer system from the video game series ''Portal (video game series), Portal''. GLaDOS later appeared in ''Th ...
and
Wheatley Wheatley may refer to: Places * Wheatley (crater), on Venus * Wheatley, Ontario, Canada * Wheatley, Hampshire, England * Wheatley, Oxfordshire, England ** Wheatley railway station * Wheatley, South Yorkshire, England * Wheatley, now Ben Rhydding, ...
from ''
Portal Portal often refers to: * Portal (architecture), an opening in a wall of a building, gate or fortification, or the extremities (ends) of a tunnel Portal may also refer to: Arts and entertainment Gaming * ''Portal'' (series), two video games ...
'', characters from ''
Team Fortress 2 ''Team Fortress 2'' is a 2007 multiplayer first-person shooter, first-person shooter game developed and published by Valve Corporation. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video game), Quake'' and ...
'',
Twilight Sparkle Princess Twilight Sparkle, commonly known as Twilight Sparkle, is a fictional character who appears in the fourth incarnation (also referred to as the fourth generation or "G4") of Hasbro's My Little Pony toyline and media franchise, beginni ...
and a number of main, secondary, and supporting characters from '' My Little Pony: Friendship Is Magic'',
SpongeBob ''SpongeBob SquarePants'' (or simply ''SpongeBob'') is an American Animated series, animated Television comedy, comedy Television show, television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It ...
from ''
SpongeBob SquarePants ''SpongeBob SquarePants'' (or simply ''SpongeBob'') is an American animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It chronicles the adventures of the title character a ...
'',
Daria Morgendorffer Daria Morgendorffer is a fictional character from the MTV animated series ''Beavis and Butt-Head'' and its spin-off ''Daria''. She was voiced in both productions by Tracy Grandstaff. In 2002, Daria placed at number 41 on the list of the ''Top 50 ...
and Jane Lane from ''
Daria ''Daria'' is an American adult animation, adult animated sitcom created by Glenn Eichler and Susie Lewis, Susie Lewis Lynn. The series ran from March 3, 1997, to January 21, 2002, on MTV. It focuses on the title character, Daria Morgendorffer, ...
'', the
Tenth Doctor The Tenth Doctor is an incarnation of the Doctor, the main protagonist of the BBC science fiction television franchise ''Doctor Who''. He is played by David Tennant in three series as well as nine specials. As with previous incarnations of the ...
from ''
Doctor Who ''Doctor Who'' is a British science fiction television series broadcast by the BBC since 1963. The series depicts the adventures of a Time Lord called the Doctor, an extraterrestrial being who appears to be human. The Doctor explores the u ...
'',
HAL 9000 HAL 9000 is a fictional artificial intelligence character and the main antagonist in Arthur C. Clarke's ''Space Odyssey'' series. First appearing in the 1968 film '' 2001: A Space Odyssey'', HAL ( Heuristically programmed ALgorithmic computer ...
from '' 2001: A Space Odyssey'', the Narrator from ''
The Stanley Parable ''The Stanley Parable'' is a story-based video game designed and written by developers Davey Wreden and William Pugh. The game carries themes such as choice in video games, the relationship between a game creator and player, and predestination ...
'', the Wii U/3DS/
Switch In electrical engineering, a switch is an electrical component that can disconnect or connect the conducting path in an electrical circuit, interrupting the electric current or diverting it from one conductor to another. The most common type of ...
Super Smash Bros. ''Super Smash Bros.'' is a Crossover (fiction), crossover fighting game series published by Nintendo. The series was created by Masahiro Sakurai, who has directed every game in the series. The series is known for its unique gameplay objectiv ...
Announcer (formerly),
Carl Brutananadilewski This is a list of characters featured in the Adult Swim animated television series ''Aqua Teen Hunger Force''. Main characters Master Shake Voiced by Dana Snyder, Master Shake (or simply Shake) is a narcissistic, lazy, shallow, selfish, and idi ...
from ''
Aqua Teen Hunger Force ''Aqua Teen Hunger Force'' (also known by various alternative titles), sometimes abbreviated as ''ATHF'' or ''Aqua Teen'', is an American adult animated television series created by Dave Willis and Matt Maiellaro for Cartoon Network's late ni ...
'',
Steven Universe ''Steven Universe'' is an American animated series, animated television series created by Rebecca Sugar for Cartoon Network. It tells the coming-of-age story of a young boy, Steven Universe (character), Steven Universe (Zach Callison), who li ...
and the Crystal Gems from ''
Steven Universe ''Steven Universe'' is an American animated series, animated television series created by Rebecca Sugar for Cartoon Network. It tells the coming-of-age story of a young boy, Steven Universe (character), Steven Universe (Zach Callison), who li ...
'', Dan from ''
Dan Vs. ''Dan Vs.'' is an American animated television series created by Dan Mandel and Chris Pearson. The series spanned three seasons, airing on The Hub from January 1, 2011, to March 9, 2013. 53 episodes were produced. Plot The show is about Dan, a ...
'', Sans from ''
Undertale ''Undertale'' is a 2015 2D role-playing video game created by American indie developer Toby Fox. The player controls a child who has fallen into the Underground: a large, secluded region under the surface of the Earth, separated by a magical ...
, The Griffin's Family from
Family Guy ''Family Guy'' is an American animated sitcom originally conceived and created by Seth MacFarlane for the Fox Broadcasting Company. The show centers around the Griffin family, Griffins, a dysfunctional family consisting of parents Peter Griff ...
,
Rick and Morty {{Infobox television , image = Rick and Morty title card (cropped).png , alt = , caption = , genre = {{Plainlist, * Animated sitcom * Adult animation * Science fiction * Black comedy * ...
and DC Superhero Girls (2019)'' The
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
model used by the application is nondeterministic: each time that speech is generated from the same string of text, the intonation of the speech will be slightly different. The application also supports manually altering the
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definition. ...
of a generated line using ''emotional contextualizers'' (a term coined by this project), a sentence or phrase that conveys the emotion of the take that serves as a guide for the model during inference. Emotional contextualizers are representations of the emotional content of a sentence deduced via transfer learned
emoji An emoji ( ; plural emoji or emojis) is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conversat ...
embeddings using DeepMoji, a deep neural network
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
algorithm developed by the
MIT Media Lab The MIT Media Lab is a research laboratory at the Massachusetts Institute of Technology, growing out of MIT's Architecture Machine Group in the School of Architecture. Its research does not restrict to fixed academic disciplines, but draws from ...
in 2017. DeepMoji was trained on 1.2 billion emoji occurrences in
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
data from 2013 to 2017, and has been found to outperform human subjects in correctly identifying sarcasm in Tweets and other online modes of communication. 15.ai uses a ''multi-speaker model''—hundreds of voices are trained concurrently rather than sequentially, decreasing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to such emotional context. Consequently, the entire lineup of characters in the application is powered by a single trained model, as opposed to multiple single-speaker models trained on different datasets. The
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Koine Greek language, Greek word (), neuter of () ...
used by 15.ai has been scraped from a variety of Internet sources, including
Oxford Dictionaries Oxford dictionary may refer to any dictionary published by Oxford University Press, particularly: Historical dictionaries * ''Oxford English Dictionary'' (''OED'') * ''Shorter Oxford English Dictionary'', abridgement of the ''OED'' Single-volume d ...
,
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
, the
CMU Pronouncing Dictionary The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthograp ...
,
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from anime and manga to video games, cooking, weapons, television, ...
,
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, images ...
, and
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
. Pronunciations of unfamiliar words are automatically deduced using
phonological rule A phonological rule is a formal way of expressing a systematic phonological or morphophonological process or diachronic sound change in language. Phonological rules are commonly used in generative phonology as a notation to capture sound-related o ...
s learned by the deep learning model. The application supports a simplified version of a set of English phonetic transcriptions known as
ARPABET ARPABET (also spelled ARPAbet) is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General ...
to correct mispronunciations or to account for heteronyms—words that are spelled the same but are pronounced differently (such as the word ''read'', which can be pronounced as either or depending on its tense). While the original ARPABET codes developed in the 1970s by the
Advanced Research Projects Agency The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adv ...
supports 50 unique symbols to designate and differentiate between English phonemes, the
CMU Pronouncing Dictionary The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthograp ...
's ARPABET convention (the set of transcription codes followed by 15.ai) reduces the symbol set to 39 phonemes by combining
allophonic In phonology, an allophone (; from the Greek , , 'other' and , , 'voice, sound') is a set of multiple possible spoken soundsor ''phones''or signs used to pronounce a single phoneme in a particular language. For example, in English, (as in ''s ...
phonetic realizations into a single standard (e.g. AXR/ER; UX/ UW) and using multiple common symbols together to replace syllabic consonants (e.g. EN/AH0 N). ARPABET strings can be invoked in the application by wrapping the string of phonemes in
curly braces A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. Typically deployed in symmetric pairs, an individual bracket may be identified as a 'left' or 'r ...
within the input box (e.g. to denote , the pronunciation of the word ''ARPABET''). The following is a table of phonemes used by 15.ai and the CMU Pronouncing Dictionary:


Background


Speech synthesis

In 2016, with the proposal of
DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...
's
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...
, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech. Tacotron2, a neural network architecture for speech synthesis developed by
Google AI Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai. Projects * Serving cloud-based TPUs (tensor processing units) in order to develop machine learning software. * De ...
, was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech. For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis. The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.


Copyrighted material in deep learning

A landmark case between
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
and the
Authors Guild The Authors Guild is America's oldest and largest professional organization for writers and provides advocacy on issues of free expression and copyright protection. Since its founding in 1912 as the Authors League of America, it has counted among ...
in 2013 ruled that
Google Books Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical c ...
—a service that searches the full text of printed copyrighted books—was
transformative In United States copyright law, transformative use or transformation is a type of fair use that builds on a copyrighted work in a different manner or for a different purpose from the original, and thus does not infringe its holder's copyright. Tr ...
, thus meeting all requirements for fair use. This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a
discriminative model Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/si ...
or a ''non-commercial''
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsis ...
was deemed legal. The legality of ''commercial'' generative models trained using copyrighted material is still under debate; due to the black-box nature of machine learning models, any allegations of copyright infringement via direct competition would be difficult to prove.


Development

15.ai was designed and created by an anonymous research scientist affiliated with the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...
known by the alias ''15''. The project began development while the developer was an undergraduate. The developer has stated that they are capable of paying the high cost of running the site out of pocket. According to posts made by its developer on
Hacker News Hacker News (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator Y Combinator. In general, content that can be submitted is defined as "any ...
, 15.ai costs several thousands of dollars per month to operate; they are able to support the project due to a successful startup
exit Exit(s) may refer to: Architecture and engineering * Door * Portal (architecture), an opening in the walls of a structure * Emergency exit * Overwing exit, a type of emergency exit on an airplane * Exit ramp, a feature of a road interchange A ...
. The developer has stated that during their undergraduate years at MIT, they were paid the minimum hourly rate to work on a related project (approximately $14 an hour in
Massachusetts Massachusetts (Massachusett language, Massachusett: ''Muhsachuweesut assachusett writing systems, məhswatʃəwiːsət'' English: , ), officially the Commonwealth of Massachusetts, is the most populous U.S. state, state in the New England ...
) that eventually evolved into 15.ai. They also stated that the democratization of voice cloning technology is not the only function of the website; in response to a user asking whether the research could be conducted without a public website, the developer wrote: The algorithm used by the project to facilitate the cloning of voices with minimal viable data has been dubbed DeepThroat (a
double entendre A double entendre (plural double entendres) is a figure of speech or a particular way of wording that is devised to have a double meaning, of which one is typically obvious, whereas the other often conveys a message that would be too socially ...
in reference to
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
using
deep neural networks Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...
and the sexual act of
deep-throating Fellatio (also known as fellation, and in slang as blowjob, BJ, giving head, or sucking off) is an oral sex act involving a person stimulating the penis of another person by using the mouth, throat, or both. Oral stimulation of the scrotum may ...
). The project and algorithm—initially conceived as part of MIT's
Undergraduate Research Opportunities Program An Undergraduate Research Opportunities Program provides funding and/or credit to undergraduate students who volunteer for faculty-mentored research projects pertaining to all academic disciplines. Participating universities Universities involved ...
—had been in development for years before the first release of the application. The developer has also worked closely with the Pony Preservation Project from /mlp/, the ''
My Little Pony ''My Little Pony'' (''MLP'') is a toy line and media franchise developed by American toy company Hasbro. The first toys were developed by Bonnie Zacherle, Charles Muenchinger, and Steve D'Aguanno, and were produced in 1981. The ponies feature c ...
''
board Board or Boards may refer to: Flat surface * Lumber, or other rigid material, milled or sawn flat ** Plank (wood) ** Cutting board ** Sounding board, of a musical instrument * Cardboard (paper product) * Paperboard * Fiberboard ** Hardboard, a ty ...
of
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from anime and manga to video games, cooking, weapons, television, ...
. The Pony Preservation Project, which began in 2019, is a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence. The ''Friendship Is Magic'' voices on 15.ai were trained on a large dataset
crowdsource Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digit ...
d by the Pony Preservation Project: audio and dialogue from the show and related media—including all nine seasons of ''Friendship Is Magic'', the 2017 movie,
spinoffs Spin-off may refer to: *Spin-off (media), a media work derived from an existing work *Corporate spin-off, a type of corporate action that forms a new company or entity * Government spin-off, civilian goods which are the result of military or gove ...
,
leaks A leak is a way (usually an opening) for fluid to escape a container or fluid-containing system, such as a tank or a ship's hull, through which the contents of the container can escape or outside matter can enter the container. Leaks are usua ...
, and various other content voiced by the same voice actors—were
parsed Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
, hand-transcribed, and processed to remove background noise. According to the developer, the collective efforts and constructive criticism from the Pony Preservation Project have been integral to the development of 15.ai. In addition, the developer has stated that the logo of 15.ai, which features a robotic
Twilight Sparkle Princess Twilight Sparkle, commonly known as Twilight Sparkle, is a fictional character who appears in the fourth incarnation (also referred to as the fourth generation or "G4") of Hasbro's My Little Pony toyline and media franchise, beginni ...
, is an homage to the fact that her voice (as originally portrayed by
Tara Strong Tara Lyn Strong (née Charendoff; born February 12, 1973) is a Canadian-American actress. She is known for her voice work in animation, websites, and video games. Strong's voice roles include animated series such as ''The New Batman Adventures ...
) was indispensable to the implementation of emotional contextualizers.


Reception

15.ai has been met with largely positive reviews. Liana Ruppert of ''
Game Informer ''Game Informer'' (''GI'', most often stylized ''gameinformer'' from the 2010s onward) is an American monthly video game magazine featuring articles, news, strategy, and reviews of video games and associated consoles. It debuted in August 1991 w ...
'' described 15.ai as "simplistically brilliant." Lauren Morton of ''
Rock, Paper, Shotgun ''Rock Paper Shotgun'' (also rendered ''Rock, Paper, Shotgun''; short ''RPS'') is a UK-based website for reporting on video games, primarily for PC. Originally launched on 13 July 2007 as an independent site, ''Rock Paper Shotgun'' was acquir ...
'' and Natalia Clayton of '' PCGamer'' called it "fascinating," and José Villalobos of '' LaPS4'' wrote that it "works as easy as it looks." Users praised the ability to easily create audio of popular characters that sound believable to those unaware that the voices had been synthesized by artificial intelligence: Zack Zwiezen of ''
Kotaku ''Kotaku'' is a video game website and blog that was originally launched in 2004 as part of the Gawker Media network. Notable former contributors to the site include Luke Smith, Cecilia D'Anastasio, Tim Rogers, and Jason Schreier. History ...
'' reported that " isgirlfriend was convinced it was a new voice line from
GLaDOS GLaDOS (Genetic Lifeform and Disk Operating System) is a fictional artificial intelligence, artificially superintelligent computer, computer system from the video game series ''Portal (video game series), Portal''. GLaDOS later appeared in ''Th ...
' voice actor,
Ellen McLain Ellen McLain (born 1952/1953) is an American voice actress. She is best known for providing the voice of GLaDOS, the primary antagonist of the ''Portal'' video game series, the Combine Overwatch in ''Half-Life 2'', and the Administrator, the an ...
," while Rionaldi Chandraseta of '' Towards Data Science'' wrote that, upon watching a
YouTube YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
video featuring popular character voices generated by 15.ai, " isfirst thought was the video creator used cameo.com to pay for new dialogues from the original voice actors" and stated that "the quality of voices done by 15.ai is miles ahead of ts competitors" Computer scientist and technology entrepreneur
Andrew Ng Andrew Yan-Tak Ng (; born 1976) is a British-born American computer scientist and technology entrepreneur focusing on machine learning and AI. Ng was a co-founder and head of Google Brain and was the former Chief Scientist at Baidu, building ...
commented in his newsletter ''
The Batch ''The'' () is a grammatical Article (grammar), article in English language, English, denoting persons or things already mentioned, under discussion, implied or otherwise presumed familiar to listeners, readers, or speakers. It is the definite ...
'' that the technology behind 15.ai could be "enormously productive" and could "revolutionize the use of
virtual actor A virtual human, virtual persona, or digital clone is the creation or re-creation of a human being in image and voice using computer-generated imagery and sound, that is often indistinguishable from the real actor. The idea of a virtual actor w ...
s"; however, he also noted that "synthesizing a human actor's voice without consent is arguably unethical and possibly illegal" and could potentially open up to cases of impersonation and fraud. In his blog '' Marginal Revolution'',
economist An economist is a professional and practitioner in the social sciences, social science discipline of economics. The individual may also study, develop, and apply theories and concepts from economics and write about economic policy. Within this ...
Tyler Cowen Tyler Cowen (; born January 21, 1962) is an American economist, columnist and blogger. He is a professor at George Mason University, where he holds the Holbert L. Harris chair in the economics department. He hosts the economics blog ''Marginal R ...
deemed 15 one of the "most underrated talents in AI and machine learning."


Impact


Fandom content creation

15.ai has been frequently used for content creation in various fandoms, including the ''My Little Pony: Friendship Is Magic'' fandom, the ''
Team Fortress 2 ''Team Fortress 2'' is a 2007 multiplayer first-person shooter, first-person shooter game developed and published by Valve Corporation. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video game), Quake'' and ...
'' fandom, the ''
Portal Portal often refers to: * Portal (architecture), an opening in a wall of a building, gate or fortification, or the extremities (ends) of a tunnel Portal may also refer to: Arts and entertainment Gaming * ''Portal'' (series), two video games ...
'' fandom, and the ''
SpongeBob SquarePants ''SpongeBob SquarePants'' (or simply ''SpongeBob'') is an American animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It chronicles the adventures of the title character a ...
'' fandom. Numerous videos and projects containing speech from 15.ai have gone
viral Viral means "relating to viruses" (small infectious agents). Viral may also refer to: Viral behavior, or virality Memetic behavior likened that of a virus, for example: * Viral marketing, the use of existing social networks to spread a marke ...
. However, some videos and projects that contain non-15.ai-generated speech have also gone viral, many of which do not properly credit the source(s) of the synthetic speech featured in them. As a consequence, many videos and projects that have been made with other speech synthesis software have been mistaken as being made with 15.ai, and vice versa. Due to this misattribution and absence of proper credit, 15.ai's terms of service has a rule that forbids having 15.ai-and-non-15.ai-generated speech in the same videos and projects. The ''My Little Pony: Friendship Is Magic'' fandom has seen a resurgence in video and musical content creation as a direct result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some
fanfiction Fan fiction or fanfiction (also abbreviated to fan fic, fanfic, fic or FF) is fictional writing written in an amateur capacity by fans, unauthorized by, but based on an existing work of fiction. The author uses copyrighted characters, settin ...
have been adapted into fully voiced "episodes": ''The Tax Breaks'' is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai with
sound effects A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. Traditi ...
and audio editing, emulating the episodic style of the early seasons of ''Friendship Is Magic''. Viral videos from the ''Team Fortress 2'' fandom that feature voices from 15.ai include ''Spy is a Furry'' (which has gained over 3 million views on YouTube total across multiple videos) and ''The RED Bread Bank'', both of which have inspired
Source Filmmaker Source Filmmaker (often abbreviated as SFM) is a 3D computer graphics software toolset published by Valve for creating animated films, utilizing the Source game engine. Source Filmmaker has been used to create many community-based animated short ...
animated video renditions. Other fandoms have used voices from 15.ai to produce viral videos. , the viral video ''
Among Us ''Among Us'' is a 2018 online multiplayer social deduction game developed and published by American game studio Innersloth. The game was inspired by the party game Mafia and the science fiction horror film '' The Thing''. The game allows for ...
Struggles'' (which uses voices from ''Friendship Is Magic'') has over 5.5 million views on YouTube;
YouTubers YouTubers are people mostly known for their work on the video sharing platform YouTube. The following is a list of YouTubers for whom Wikipedia has articles either under their own name or their YouTube channel name. This list excludes people who ...
, TikTokers, and
Twitch Twitch may refer to: Biology * Muscle contraction ** Convulsion, rapid and repeated muscle contraction and relaxation ** Fasciculation, a small, local, involuntary muscle contraction ** Myoclonic twitch, a jerk usually caused by sudden muscle con ...
streamers have also used 15.ai for their videos, such as FitMC's video on the history of
2b2t 2b2t (2builders2tools) is a ''Minecraft'' server founded in December 2010. 2b2t has practically no rules and players are not banned, known within ''Minecraft'' as an "anarchy server". As a result, players commonly engage in harassment, col ...
—one of the oldest running ''
Minecraft ''Minecraft'' is a sandbox game developed by Mojang Studios. The game was created by Markus "Notch" Persson in the Java programming language. Following several early private testing versions, it was first made public in May 2009 before being ...
'' servers—and datpon3's TikTok video featuring the main characters of ''Friendship Is Magic'', which have 1.4 million and 510 thousand views, respectively. Some users have created AI
virtual assistant An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term "chatbot" is sometimes used to refer to virtual ...
s using 15.ai and external voice control software. One user on Twitter created their own personal
GLaDOS GLaDOS (Genetic Lifeform and Disk Operating System) is a fictional artificial intelligence, artificially superintelligent computer, computer system from the video game series ''Portal (video game series), Portal''. GLaDOS later appeared in ''Th ...
desktop assistant using the voice control system VoiceAttack that is able to boot up applications, utter corresponding random dialogues, and thank the user in response to actions.


Troy Baker / Voiceverse NFT plagiarism scandal

In December 2021, the developer of 15.ai posted on
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
that they had no interest in incorporating
non-fungible tokens A non-fungible token (NFT) is a unique digital identifier that cannot be copied, substituted, or subdivided, that is recorded in a blockchain, and that is used to certify authenticity and ownership. The ownership of an NFT is recorded in the b ...
(NFTs) into their work. On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game and
anime is Traditional animation, hand-drawn and computer animation, computer-generated animation originating from Japan. Outside of Japan and in English, ''anime'' refers specifically to animation produced in Japan. However, in Japan and in Japane ...
dub
voice actor Voice acting is the art of performing voice-overs to present a character or provide information to an audience. Performers are called voice actors/actresses, voice artists, dubbing artists, voice talent, voice-over artists, or voice-over talent ...
Troy Baker Troy Baker (born April 1, 1976) is an American voice actor and musician. Baker is known for his video game roles, including Joel Miller in ''The Last of Us'' (2013) and its sequel (2020), Booker DeWitt in ''BioShock Infinite'' (2013), Samuel ...
announced his partnership with, had plagiarized voice lines generated from 15.ai as part of their marketing campaign.
Log files In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or lo ...
showed that Voiceverse had generated audio of
Twilight Sparkle Princess Twilight Sparkle, commonly known as Twilight Sparkle, is a fictional character who appears in the fourth incarnation (also referred to as the fourth generation or "G4") of Hasbro's My Little Pony toyline and media franchise, beginni ...
and
Rainbow Dash The ''My Little Pony'' franchise debuted in 1982, as the creation of American illustrator and designer Bonnie Zacherle. Together with sculptor Charles Muenchinger and manager Steve D'Aguanno, Zacherle submitted a design patent in August 1981 fo ...
from the show '' My Little Pony: Friendship Is Magic'' using 15.ai, pitched them up to make them sound unrecognizable from the original voices, and appropriated them without proper credit to falsely market their own platform—a violation of 15.ai's terms of service. A week prior to the announcement of the partnership with Baker, Voiceverse made a (now-deleted) Twitter post directly responding to a (now-deleted) video posted by Chubbiverse—an NFT platform with which Voiceverse had partnered—showcasing an AI-generated voice and claimed that it was generated using Voiceverse's platform, remarking ''"I wonder who created the voice for this? ;)"'' A few hours after news of the partnership broke, the developer of 15.ai—having been alerted by another Twitter user asking for his opinion on the partnership, to which he speculated that it "sounds like a scam"—posted
screenshots screenshot (also known as screen capture or screen grab) is a digital image that shows the contents of a computer display. A screenshot is created by the operating system or software running on the device powering the display. Additionally, s ...
of log files that proved that a user of the website (with their
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
redacted) had submitted inputs of the exact words spoken by the AI voice in the video posted by Chubbiverse, and subsequently responded to Voiceverse's claim directly, tweeting "Certainly not you :)". Following the tweet, Voiceverse admitted to plagiarizing voices from 15.ai as their own platform, claiming that their
marketing Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
team had used the project without giving proper credit and that the "Chubbiverse team adno knowledge of this." In response to the admission, 15 tweeted " Go fuck yourself." The final tweet went
viral Viral means "relating to viruses" (small infectious agents). Viral may also refer to: Viral behavior, or virality Memetic behavior likened that of a virus, for example: * Viral marketing, the use of existing social networks to spread a marke ...
, accruing over 75,000 total likes and 13,000 total retweets across multiple reposts. The initial partnership between Baker and Voiceverse was met with severe backlash and universally negative reception. Critics highlighted the environmental impact of and potential for
exit scam An exit scam is a confidence trick where an established business stops shipping orders while receiving payment for new orders. If the entity had a good reputation, it could take some time before it is widely recognized that orders are not shipping ...
s associated with NFT sales. Commentators also pointed out the irony in Baker's initial Tweet announcing the partnership, which ended with "You can hate. Or you can create. What'll it be?", hours before the public revelation that the company in question had resorted to theft instead of creating their own product. Baker responded that he appreciated people sharing their thoughts and their responses were "giving ima lot to think about." He also acknowledged that the "hate/create" part in his initial Tweet might have been "a bit antagonistic," and asked fans on social media to forgive him. Two weeks later, on January 31, Baker announced that he would discontinue his partnership with Voiceverse.


Resistance from voice actors

Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about impersonation and fraud, unauthorized use of an actor's voice in
pornography Pornography (often shortened to porn or porno) is the portrayal of sexual subject matter for the exclusive purpose of sexual arousal. Primarily intended for adults,
, and the potential of AI being used to make voice actors obsolete.


List of voices

All characters available on 15.ai (both currently and formerly) are listed in the table below.


See also

*
Audio deepfake The audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life ...
* Character.ai *
ChatGPT ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
*
DALL-E DALL-E (stylized as DALL·E) and DALL-E 2 are deep learning models developed by OpenAI to generate digital images from natural language descriptions, called "prompts". DALL-E was revealed by OpenAI in a blog post in January 2021, and uses a ver ...
*
Deepfakes Deepfakes (a portmanteau of "deep learning" and "fake") are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of creating fake content is not new, deepfakes leverage powerful ...
*
Midjourney Midjourney is an independent research lab that produces an artificial intelligence program under the same name that creates images from textual descriptions, similar to OpenAI's DALL-E and Stable Diffusion. It is speculated that the underlying t ...
*
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and genera ...
*
Synthetic media Synthetic media (also known as AI-generated media, generative AI, personalized media, and colloquially as deepfakes) is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especiall ...
*
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...


Notes


References

;Notes ;Tweets ;YouTube (referenced for view counts and usage of 15.ai only) ;TikTok


External links

* *
''The Tax Breaks (Twilight) (15.ai)''
{{My Little Pony: Friendship Is Magic Speech synthesis Deep learning software applications Applications of artificial intelligence Deepfakes Massachusetts Institute of Technology alumni My Little Pony: Friendship Is Magic My Little Pony fandom Computer-related introductions in 2020 Web applications 2020 in Internet culture 2020s fads and trends