HOME

TheInfoList



OR:

Synthetic media (also known as AI-generated media, generative AI, personalized media, and colloquially as
deepfakes Deepfakes (a portmanteau of "deep learning" and "fake") are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of creating fake content is not new, deepfakes leverage powerful ...
) is a catch-all term for the artificial production, manipulation, and modification of data and media by
automated Automation describes a wide range of technologies that reduce human intervention in processes, namely by predetermining decision criteria, subprocess relationships, and related actions, as well as embodying those predeterminations in machines ...
means, especially through the use of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of
generative adversarial networks A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology (and often use "deepfakes" as a euphemism, e.g. "deepfakes for text" for natural-language generation; "deepfakes for voices" for neural
voice cloning Digital cloning is an emerging technology, that involves deep-learning algorithms, which allows one to manipulate currently existing audio, photos, and videos that are hyper-realistic. One of the impacts of such technology is that hyper-realistic ...
, etc.) Significant attention arose towards the field of synthetic media starting in 2017 when ''Motherboard'' reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards synthetic media include the potential to spread misinformation and cause viewers to distrust reality, the mass automation of creative and journalistic jobs, and potentially create a complete retreat into AI-generated fantasy worlds. Synthetic media is an applied form of
artificial imagination Artificial imagination, also called synthetic imagination or machine imagination, is defined as the artificial simulation of human imagination by general or special purpose computers or artificial neural networks. The applied form of it is known ...
.


History


Pre-1950s

Synthetic media as a process of automated art dates back to the
automata An automaton (; plural: automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions.Automaton – Definition and More ...
of
ancient Greek civilization Ancient Greece ( el, Ἑλλάς, Hellás) was a northeastern Mediterranean civilization, existing from the Greek Dark Ages of the 12th–9th centuries BC to the end of classical antiquity ( AD 600), that comprised a loose collection of cultu ...
, where inventors such as
Daedalus In Greek mythology, Daedalus (, ; Greek: Δαίδαλος; Latin: ''Daedalus''; Etruscan: ''Taitale'') was a skillful architect and craftsman, seen as a symbol of wisdom, knowledge and power. He is the father of Icarus, the uncle of Perdix, an ...
and
Hero of Alexandria Hero of Alexandria (; grc-gre, Ἥρων ὁ Ἀλεξανδρεύς, ''Heron ho Alexandreus'', also known as Heron of Alexandria ; 60 AD) was a Greece, Greek mathematician and engineer who was active in his native city of Alexandria, Roman Egy ...
designed machines capable of writing text, generating sounds, and playing music. The tradition of automaton-based entertainment flourished throughout history, with mechanical beings' seemingly magical ability to mimic human creativity often drawing crowds throughout Europe, China, India, and so on. Other automated novelties such as
Johann Philipp Kirnberger Johann Philipp Kirnberger (also ''Kernberg''; 24 April 1721, Saalfeld – 27 July 1783, Berlin) was a musician, composer (primarily of fugues), and music theorist. He was a student of Johann Sebastian Bach. According to Ingeborg Allihn, Kirnberg ...
's " Musikalisches Würfelspiel" (Musical Dice Game) 1757 also amused audiences.Nierhaus, Gerhard (2009). ''Algorithmic Composition: Paradigms of Automated Music Generation'', pp. 36 & 38n7. . Despite the technical capabilities of these machines, however, none were capable of generating original content and were entirely dependent upon their mechanical designs.


Rise of artificial intelligence

The field of AI research was born at a workshop at
Dartmouth College Dartmouth College (; ) is a private research university in Hanover, New Hampshire. Established in 1769 by Eleazar Wheelock, it is one of the nine colonial colleges chartered before the American Revolution. Although founded to educate Native A ...
in 1956,
Dartmouth conference The Dartmouth Conference is the longest continuous bilateral dialogue between American and Soviet (now Russian) representatives. The first Dartmouth Conference took place at Dartmouth College in 1961. Subsequent conferences were held through 1990 ...
: * * , who writes "the conference is generally recognized as the official birthdate of the new science." * , who call the conference "the birth of artificial intelligence." *
begetting the rise of digital computing used as a medium of art as well as the rise of
generative art Generative art refers to art that in whole or in part has been created with the use of an autonomous system. An autonomous system in this context is generally one that is non-human and can independently determine features of an artwork that wo ...
. Initial experiments in AI-generated art included the ''
Illiac Suite ''Illiac Suite'' (later retitled String Quartet No. 4)Andrew Stiller, "Hiller, Lejaren (Arthur)", ''Grove Music Online'' (reviewed December 3, 2010; accessed December 14, 2014). is a 1957 composition for string quartet which is generally agreed t ...
'', a 1957 composition for
string quartet The term string quartet can refer to either a type of musical composition or a group of four people who play them. Many composers from the mid-18th century onwards wrote string quartets. The associated musical ensemble consists of two violinists ...
which is generally agreed to be the first score composed by an
electronic Electronic may refer to: *Electronics, the science of how to control electric energy in semiconductor * ''Electronics'' (magazine), a defunct American trade journal *Electronic storage, the storage of data using an electronic device *Electronic co ...
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
.Denis L. Baggi,
The Role of Computer Technology in Music and Musicology
", ''lim.dico.unimi.it'' (December 9, 1998).
Lejaren Hiller Lejaren Arthur Hiller Jr. (February 23, 1924, New York City – January 26, 1994, Buffalo, New York)Lejaren Hi ...
, in collaboration with
Leonard Issacson Leonard Maxwell Isaacson (born 1925) is an American chemist and composer. Isaacson collaborated with Lejaren Hiller on the computer-programmed acoustic composition, ''Illiac Suite'' (1957).
, programmed the
ILLIAC I The ILLIAC I (Illinois Automatic Computer), a pioneering computer in the ILLIAC series of computers built in 1952 by the University of Illinois, was the first computer built and owned entirely by a United States educational institution. Compute ...
computer at the
University of Illinois at Urbana–Champaign The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Universit ...
(where both composers were professors) to generate compositional material for his String Quartet No. 4. In 1960, Russian researcher R.Kh.Zaripov published worldwide first paper on algorithmic music composing using the " Ural-1" computer. In 1965, inventor
Ray Kurzweil Raymond Kurzweil ( ; born February 12, 1948) is an American computer scientist, author, inventor, and futurist. He is involved in fields such as optical character recognition (OCR), text-to-speech synthesis, speech recognition technology, and e ...
premiered a piano piece created by a computer that was capable of pattern recognition in various compositions. The computer was then able to analyze and use these patterns to create novel melodies. The computer was debuted on
Steve Allen Stephen Valentine Patrick William Allen (December 26, 1921 – October 30, 2000) was an American television personality, radio personality, musician, composer, actor, comedian, and writer. In 1954, he achieved national fame as the co-cre ...
's
I've Got a Secret ''I've Got a Secret'' is an American panel game show produced by Mark Goodson and Bill Todman for CBS television. Created by comedy writers Allan Sherman and Howard Merrill, it was a derivative of Goodson-Todman's own panel show, ''What's My Line ...
program, and stumped the hosts until film star
Harry Morgan Harry Morgan (born Harry Bratsberg; April 10, 1915 – December 7, 2011) was an American actor and director whose television and film career spanned six decades. Morgan's major roles included Pete Porter in both ''December Bride'' (1954–1959 ...
guessed Ray's secret. Before 1989,
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s have been used to model certain aspects of creativity. Peter Todd (1989) first trained a neural network to reproduce musical melodies from a training set of musical pieces. Then he used a change algorithm to modify the network's input parameters. The network was able to randomly generate new music in a highly uncontrolled manner. In 2014,
Ian Goodfellow Ian J. Goodfellow (born ) is a computer scientist, engineer, and executive, most noted for his work on artificial neural networks and deep learning. He was previously employed as a research scientist at Google Brain and director of machine lea ...
and his colleagues developed a new class of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
systems:
generative adversarial networks A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
(GAN). Two
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s contest with each other in a game (in the sense of
game theory Game theory is the study of mathematical models of strategic interactions among rational agents. Myerson, Roger B. (1991). ''Game Theory: Analysis of Conflict,'' Harvard University Press, p.&nbs1 Chapter-preview links, ppvii–xi It has appli ...
, often but not always in the form of a
zero-sum game Zero-sum game is a mathematical representation in game theory and economic theory of a situation which involves two sides, where the result is an advantage for one side and an equivalent loss for the other. In other words, player one's gain is e ...
). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsis ...
for
unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
, GANs have also proven useful for
semi-supervised learning Weak supervision is a branch of machine learning where noisy, limited, or imprecise sources are used to provide supervision signal for labeling large amounts of training data in a supervised learning setting. This approach alleviates the burden of o ...
, fully
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
, and
reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
. In a 2016 seminar,
Yann LeCun Yann André LeCun ( , ; originally spelled Le Cun; born 8 July 1960) is a French computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor ...
described GANs as "the coolest idea in machine learning in the last twenty years". In 2017,
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
unveiled
transformers ''Transformers'' is a media franchise produced by American toy company Hasbro and Japanese toy company Takara Tomy. It primarily follows the Autobots and the Decepticons, two alien robot factions at war that can transform into other forms, suc ...
, a new type of neural network architecture specialized for language modeling that enabled for rapid advancements in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
. Transformers proved capable of high levels of generalization, allowing networks such as
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
and Jukebox from OpenAI to synthesize text and music respectively at a level approaching humanlike ability. There have been some attempts to use
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
and
GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, answers questions, summarizes passages, and generates text output on a level that, while somet ...
for screenplay writing, resulting in both dramatic (the italian short film ''Frammenti di Anime Meccaniche','' written by
GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, answers questions, summarizes passages, and generates text output on a level that, while somet ...
) and comedic narratives (the short film ''Solicitors'' by Youtube Creator ''Calamity A''I written by
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
).


Branches of synthetic media


Deepfakes

Deepfakes (a
portmanteau A portmanteau word, or portmanteau (, ) is a blend of wordsdeep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
" and "fake") are the most prominent form of synthetic media. They are media that take a person in an existing image or video and replace them with someone else's likeness using
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s. They often combine and superimpose existing media onto source media using
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
techniques known as
autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder lear ...
s and
generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
s (GANs). Deepfakes have garnered widespread attention for their uses in celebrity pornographic videos, revenge porn,
fake news Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.Schlesinger, Robert (April 14, 2017)"Fake news in reality ...
,
hoax A hoax is a widely publicized falsehood so fashioned as to invite reflexive, unthinking acceptance by the greatest number of people of the most varied social identities and of the highest possible social pretensions to gull its victims into pu ...
es, and
financial fraud In law, fraud is intent (law), intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate Civil law (common law), civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid t ...
. This has elicited responses from both industry and government to detect and limit their use. The term deepfakes originated around the end of 2017 from a
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, images ...
user named "deepfakes". He, as well as others in the Reddit community r/deepfakes, shared deepfakes they created; many videos involved celebrities’ faces swapped onto the bodies of actresses in pornographic videos, while non-pornographic content included many videos with actor
Nicolas Cage Nicolas Kim Coppola (born January 7, 1964), known professionally as Nicolas Cage, is an American actor and film producer. Born into the Coppola family, he is the recipient of various accolades, including an Academy Award, a Screen Actors Gu ...
’s face swapped into various movies. In December 2017, Samantha Cole published an article about r/deepfakes in ''
Vice A vice is a practice, behaviour, or habit generally considered immoral, sinful, criminal, rude, taboo, depraved, degrading, deviant or perverted in the associated society. In more minor usage, vice can refer to a fault, a negative character tra ...
'' that drew the first mainstream attention to deepfakes being shared in online communities. Six weeks later, Cole wrote in a follow-up article about the large increase in AI-assisted fake pornography. In February 2018, r/deepfakes was banned by Reddit for sharing involuntary pornography. Other websites have also banned the use of deepfakes for involuntary pornography, including the social media platform
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
and the pornography site
Pornhub Pornhub is a Canadian-owned internet pornography website. It is one of several pornographic video-streaming websites owned by MindGeek. , Pornhub is the 10th-most-trafficked website in the world and the second-most-trafficked adult website aft ...
. However, some websites have not yet banned Deepfake content, including
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from anime and manga to video games, cooking, weapons, television, ...
and
8chan 8kun, previously called 8chan, Infinitechan or Infinitychan (stylized as ∞chan), is an imageboard website composed of user-created message boards. An owner moderates each board, with minimal interaction from site administration. The site ha ...
. Non-pornographic deepfake content continues to grow in popularity with videos from
YouTube YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
creators such as Ctrl Shift Face and Shamook. A mobile application, Impressions, was launched for
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
in March 2020. The app provides a platform for users to deepfake celebrity faces into videos in a matter of minutes.


Image synthesis

Image synthesis Rendering or image synthesis is the process of generating a photorealistic or non-photorealistic image from a 2D or 3D model by means of a computer program. The resulting image is referred to as the render. Multiple models can be defined ...
is the artificial production of visual media, especially through algorithmic means. In the emerging world of synthetic media, the work of digital-image creation—once the domain of highly skilled programmers and Hollywood special-effects artists—could be automated by expert systems capable of producing realism on a vast scale. One subfield of this includes
human image synthesis Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery ha ...
, which is the use of neural networks to make believable and even
photorealistic Photorealism is a genre of art that encompasses painting, drawing and other graphic media, in which an artist studies a photograph and then attempts to reproduce the image as realistically as possible in another medium. Although the term can be ...
renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using
computer generated imagery Computer-generated imagery (CGI) is the use of computer graphics to create or contribute to images in art, printed media, video games, simulators, and visual effects in films, television programs, shorts, commercials, and videos. The images ma ...
have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the
2010s File:2010s collage v21.png, From top left, clockwise: Anti-government protests called the Arab Spring arose in 2010–2011, and as a result, many governments were overthrown, including when Libyan dictator Muammar Gaddafi was Death of Muammar Gadd ...
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. The website This Person Does Not Exist showcases fully automated
human image synthesis Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery ha ...
by endlessly generating images that look like facial portraits of human faces.


Audio synthesis

Beyond deepfakes and image synthesis, audio is another area where AI is used to create synthetic media. Synthesized audio will be capable of generating any conceivable sound that can be achieved through audio waveform manipulation, which might conceivably be used to generate stock audio of sound effects or simulate audio of currently imaginary things.


AI art


Music generation

The capacity to generate music through autonomous, non-programmable means has long been sought after since the days of Antiquity, and with developments in artificial intelligence, two particular domains have arisen: # The robotic creation of music, whether through machines playing instruments or sorting of virtual instrument notes (such as through
MIDI MIDI (; Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, computers, and re ...
files) # Directly generating
waveform In electronics, acoustics, and related fields, the waveform of a signal is the shape of its graph as a function of time, independent of its time and magnitude scales and of any displacement in time.David Crecraft, David Gorham, ''Electronic ...
s that perfectly recreate instrumentation and human voice without the need for instruments, MIDI, or organizing premade notes.


Speech synthesis

Speech synthesis has been identified as a popular branch of synthetic media and is defined as the artificial production of human
speech Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...
. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render
symbolic linguistic representation A symbolic linguistic representation is a representation of an utterance that uses symbols to represent linguistic information about the utterance, such as information about phonetics, phonology, morphology, syntax, or semantics. Symbolic linguist ...
s like
phonetic transcription Phonetic transcription (also known as phonetic script or phonetic notation) is the visual representation of speech sounds (or ''phones'') by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the ...
s into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
. Systems differ in the size of the stored speech units; a system that stores
phones A telephone is a telecommunications device that permits two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most efficiently the human voice, into ele ...
or
diphone In phonetics, a diphone is an adjacent pair of phones in an utterance. For example, in aɪfəʊn the diphones are a ɪ �f ə �ʊ �n The term is usually used to refer to a recording of the transition between two phones. In the following d ...
s provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the
vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx (biology), syrinx in birds) is filtered. In birds it consists of the Vertebrate trachea, trachea, the Syrinx (bio ...
and other human voice characteristics to create a completely "synthetic" voice output. Virtual assistants such as Siri and Alexa have the ability to turn text into audio and synthesize speech. In 2016,
Google DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
unveiled WaveNet, a deep generative model of raw audio waveforms that could learn to understand which waveforms best resembled human speech as well as musical instrumentation. Some projects offer real-time generations of synthetic speech using
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
, such as 15.ai, a
web application A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serve ...
text-to-speech tool developed by an
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
research scientist.


Natural-language generation

Natural-language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics th ...
(NLG, sometimes synonymous with text synthesis) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a
chatbot A chatbot or chatterbot is a Software agent, software application used to conduct an on-line chat conversation via text or Speech synthesis, text-to-speech, in lieu of providing direct contact with a live human agent. Designed to convincingly si ...
) which might even be read out by a
text-to-speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
system. Interest in natural-language generation increased in 2019 after
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
unveiled GPT2, an AI system that generates text matching its input in subject and tone. GPT2 is a
transformer A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer' ...
, a
deep Deep or The Deep may refer to: Places United States * Deep Creek (Appomattox River tributary), Virginia * Deep Creek (Great Salt Lake), Idaho and Utah * Deep Creek (Mahantango Creek tributary), Pennsylvania * Deep Creek (Mojave River tributary), ...
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
model introduced in 2017 used primarily in the field of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
(NLP).


Interactive media synthesis

AI-generated media can be used to develop a hybrid graphics system that could be used in video games, movies, and virtual reality, as well as text-based games such as AI Dungeon 2, which uses either
GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, answers questions, summarizes passages, and generates text output on a level that, while somet ...
or
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
to allow for near-infinite possibilities that are otherwise impossible to create through traditional game development methods. Computer hardware company
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
has also worked on developed AI-generated video game demos, such as a model that can generate an interactive game based on non-interactive videos. Through
procedural generation In computing, procedural generation is a method of creating data algorithmically as opposed to manually, typically through a combination of human-generated assets and algorithms coupled with computer-generated randomness and processing power. In ...
, synthetic media techniques may eventually be used to "help designers and developers create art assets, design levels, and even build entire games from the ground up."


Concerns and controversies

Deepfakes have been used to misrepresent well-known politicians in videos. In separate videos, the face of the Argentine President
Mauricio Macri Mauricio Macri (; born 8 February 1959) is an Argentine businessman and politician who served as the President of Argentina from 2015 to 2019. He has been the leader of the Republican Proposal (PRO) party since its founding in 2005. He previou ...
has been replaced by the face of
Adolf Hitler Adolf Hitler (; 20 April 188930 April 1945) was an Austrian-born German politician who was dictator of Nazi Germany, Germany from 1933 until Death of Adolf Hitler, his death in 1945. Adolf Hitler's rise to power, He rose to power as the le ...
, and
Angela Merkel Angela Dorothea Merkel (; ; born 17 July 1954) is a German former politician and scientist who served as Chancellor of Germany from 2005 to 2021. A member of the Christian Democratic Union (CDU), she previously served as Leader of the Oppo ...
's face has been replaced with
Donald Trump Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who served as the 45th president of the United States from 2017 to 2021. Trump graduated from the Wharton School of the University of Pe ...
's. In June 2019, a downloadable
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
application called DeepNude was released which used neural networks, specifically
generative adversarial networks A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
, to remove clothing from images of women. The app had both a paid and unpaid version, the paid version costing $50. On June 27 the creators removed the application and refunded consumers. The US Congress held a senate meeting discussing the widespread impacts of synthetic media, including deepfakes, describing it as having the "potential to be used to undermine national security, erode public trust in our democracy and other nefarious reasons." In 2019, voice cloning technology was used to successfully impersonate a chief executive's voice and demand a fraudulent transfer of €220,000. The case raised concerns about the lack of encryption methods over telephones as well as the unconditional trust often given to voice and to media in general. Starting in November 2019, multiple social media networks began banning synthetic media used for purposes of manipulation in the lead-up to the
2020 United States presidential election The 2020 United States presidential election was the 59th quadrennial presidential election, held on Tuesday, November 3, 2020. The Democratic ticket of former vice president Joe Biden and the junior U.S. senator from California Kamala Ha ...
.


Potential uses and impacts

Synthetic media techniques involve generating, manipulating, and altering
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
to emulate creative processes on a much faster and more accurate scale. As a result, the potential uses are as wide as human creativity itself, ranging from revolutionizing the
entertainment industry Entertainment is a form of activity that holds the attention and Interest (emotion), interest of an audience or gives pleasure and delight. It can be an idea or a task, but is more likely to be one of the activities or events that have dev ...
to accelerating the research and production of academia. The initial application has been to synchronise lip-movements to increase the engagement of normal dubbing that is growing fast with the rise of OTTs. News organizations have explored ways to use video synthesis and other synthetic media technologies to become more efficient and engaging. Potential future hazards include the use of a combination of different subfields to generate
fake news Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.Schlesinger, Robert (April 14, 2017)"Fake news in reality ...
, natural-language bot swarms generating trends and
memes A meme ( ) is an idea, behavior, or style that spreads by means of imitation from person to person within a culture and often carries symbolic meaning representing a particular phenomenon or theme. A meme acts as a unit for carrying cultural i ...
, false evidence being generated, and potentially addiction to personalized content and a retreat into AI-generated fantasy worlds within virtual reality. Advanced text-generating
bots The British Overseas Territories (BOTs), also known as the United Kingdom Overseas Territories (UKOTs), are fourteen territories with a constitutional and historical link with the United Kingdom. They are the last remnants of the former Bri ...
could potentially be used to manipulate social media platforms through tactics such as
astroturfing Astroturfing is the practice of masking the sponsors of a message or organization (e.g., political, advertising, religious or public relations) to make it appear as though it originates from and is supported by grassroots participants. It is a p ...
.
Deep reinforcement learning Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorpor ...
-based natural-language generators could potentially be used to create advanced chatbots that could imitate natural human speech. One use case for natural-language generation is to generate or assist with writing novels and short stories, while other potential developments are that of stylistic editors to emulate professional writers. Image synthesis tools may be able to streamline or even completely automate the creation of certain aspects of visual illustrations, such as
animated cartoons Animation is a method by which still figures are manipulated to appear as moving images. In traditional animation, images are drawn or painted by hand on transparent celluloid sheets to be photographed and exhibited on film. Today, most anima ...
,
comic books A comic book, also called comicbook, comic magazine or (in the United Kingdom and Ireland) simply comic, is a publication that consists of comics art in the form of sequential juxtaposed panels that represent individual scenes. Panels are of ...
, and
political cartoons A political cartoon, a form of editorial cartoon, is a cartoon graphic with caricatures of public figures, expressing the artist's opinion. An artist who writes and draws such images is known as an editorial cartoonist. They typically combine ...
. Because the automation process takes away the need for teams of designers, artists, and others involved in the making of entertainment, costs could plunge to virtually nothing and allow for the creation of "bedroom multimedia franchises" where singular people can generate results indistinguishable from the highest budget productions for little more than the cost of running their computer. Character and scene creation tools will no longer be based on premade assets, thematic limitations, or personal skill but instead based on tweaking certain parameters and giving enough input. A combination of speech synthesis and deepfakes has been used to automatically redub an actor's speech into multiple languages without the need for reshoots or language classes. It also can be used by companies for employee onboarding, eLearning, explainer and how-to videos An increase in cyberattacks has also been feared due to methods of
phishing Phishing is a type of social engineering where an attacker sends a fraudulent (e.g., spoofed, fake, or otherwise deceptive) message designed to trick a person into revealing sensitive information to the attacker or to deploy malicious softwar ...
,
catfishing Catfishing is a deceptive activity in which a person creates a fictional persona or fake identity on a social networking service, usually targeting a specific victim. The practice may be used for financial gain, to compromise a victim in so ...
, and social hacking being more easily automated by new technological methods. Natural-language generation bots mixed with image synthesis networks may theoretically be used to clog search results, filling
search engines A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
with trillions of otherwise useless but legitimate-seeming blogs, websites, and marketing spam. There has been speculation about deepfakes being used for creating digital actors for future films. Digitally constructed/altered humans have already been used in
film A film also called a movie, motion picture, moving picture, picture, photoplay or (slang) flick is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, feelings, beauty, or atmosphere ...
s before, and deepfakes could contribute new developments in the near future. Amateur deepfake technology has already been used to insert faces into existing films, such as the insertion of
Harrison Ford Harrison Ford (born July 13, 1942) is an American actor. His films have grossed more than $5.4billion in North America and more than $9.3billion worldwide, making him the seventh-highest-grossing actor in North America. He is the recipient o ...
's young face onto Han Solo's face in '' Solo: A Star Wars Story'', and techniques similar to those used by deepfakes were used for the acting of Princess Leia in ''
Rogue One ''Rogue One: A Star Wars Story'' (or simply ''Rogue One'') is a 2016 American epic space opera film directed by Gareth Edwards. The screenplay by Chris Weitz and Tony Gilroy is from a story by John Knoll and Gary Whitta. It was produced by Luc ...
.'' GANs can be used to create photos of imaginary fashion models, with no need to hire a model, photographer, makeup artist, or pay for a studio and transportation. GANs can be used to create fashion advertising campaigns including more diverse groups of models, which may increase intent to buy among people resembling the models or family members. GANs can also be used to create portraits, landscapes and album covers. The ability for GANs to generate photorealistic human bodies presents a challenge to industries such as
fashion modeling Fashion photography is a genre of photography which is devoted to displaying clothing and other fashion items, sometimes haute couture. It typically consists of a fashion photographer taking a picture of a dressed model in a photographic studio ...
, which may be at heightened risk of being automated. In 2019, Dadabots unveiled an AI-generated stream of death metal which remains ongoing with no pauses. Musical artists and their respective brands may also conceivably be generated from scratch, including AI-generated music, videos, interviews, and promotional material. Conversely, existing music can be completely altered at will, such as changing lyrics, singers, instrumentation, and composition. In 2018, using a process by WaveNet for timbre musical transfer, researchers were able to shift entire genres from one to another. Through the use of artificial intelligence, old bands and artists may be "revived" to release new material without pause, which may even include "live" concerts and promotional images. Neural network-powered
photo manipulation Photograph manipulation involves the transformation or alteration of a photograph using various methods and techniques to achieve desired results. Some photograph manipulations are considered to be skillful artwork, while others are consider ...
has the potential to abet the behaviors of
totalitarian Totalitarianism is a form of government and a political system that prohibits all opposition parties, outlaws individual and group opposition to the state and its claims, and exercises an extremely high if not complete degree of control and regul ...
and absolutist regimes. A sufficiently paranoid totalitarian government or community may engage in a total wipe-out of history using all manner of synthetic technologies, fabricating history and personalities as well as any evidence of their existence at all times. Even in otherwise rational and democratic societies, certain social and political groups may utilize synthetic to craft cultural, political, and scientific cocoons that greatly reduce or even altogether destroy the ability of the public to agree on basic objective facts. Conversely, the existence of synthetic media will be used to discredit factual news sources and scientific facts as "potentially fabricated."


See also

* 15.ai *
Algorithmic art Algorithmic art or algorithm art is art, mostly visual art, in which the design is generated by an algorithm. Algorithmic artists are sometimes called ''algorists''. Overview Algorithmic art, also known as computer-generated art, is a subset o ...
*
Artificial imagination Artificial imagination, also called synthetic imagination or machine imagination, is defined as the artificial simulation of human imagination by general or special purpose computers or artificial neural networks. The applied form of it is known ...
*
Automated journalism Automated journalism, also known as algorithmic journalism or robot journalism, is a term that attempts to describe modern technological processes that have infilitrated the journalistic profession, such as news articles generated by computer pro ...
*
Computational creativity Computational creativity (also known as artificial creativity, mechanical creativity, creative computing or creative computation) is a multidisciplinary endeavour that is located at the intersection of the fields of artificial intelligence, cogn ...
*
Computer music Computer music is the application of computing technology in music composition, to help human composers create new music or to have computers independently create music, such as with algorithmic composition programs. It includes the theory and ap ...
*
DALL-E DALL-E (stylized as DALL·E) and DALL-E 2 are deep learning models developed by OpenAI to generate digital images from natural language descriptions, called "prompts". DALL-E was revealed by OpenAI in a blog post in January 2021, and uses a ver ...
*
Deepfakes Deepfakes (a portmanteau of "deep learning" and "fake") are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of creating fake content is not new, deepfakes leverage powerful ...
*
Generative art Generative art refers to art that in whole or in part has been created with the use of an autonomous system. An autonomous system in this context is generally one that is non-human and can independently determine features of an artwork that wo ...
*
Generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
*
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
*
Human image synthesis Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery ha ...
*
Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily in the fields of natural language processing (NLP) and computer vi ...
*
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...


References

{{Differentiable computing Artificial intelligence Mass media