Adobe Voco is an unreleased audio editing and generating prototype software by

Adobe Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for ''mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of e ...

that enables novel editing and generation of audio. Dubbed "

Photoshop Adobe Photoshop is a raster graphics editor developed and published by Adobe Inc. for Windows and macOS. It was originally created in 1988 by Thomas and John Knoll. Since then, the software has become the industry standard not only in raster ...

-for-voice", it was first previewed at the

Adobe MAX Adobe MAX is an annual creativity conference held by Adobe Inc. The event helps Adobe to present the new developments of its suite of applications and to build a community of creative professionals. History The first MAX conference was he ...

event in November 2016. The technology shown at Adobe MAX was a preview that could potentially be incorporated into

Adobe Creative Cloud Adobe Creative Cloud is a set of applications and services from Adobe Inc. that gives subscribers access to a collection of software used for graphic design, video editing, web development, photography, along with a set of mobile applications a ...

. It was later revealed that Voco was never meant to be released and was meant to be a research prototype.

Technical details

As the demo showed, the software takes approximately 20 minutes of the desired target's speech and generates a sound-alike voice including

phonemes In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west o ...

that were not present in the target example material. Adobe stated Voco would lower the cost of audio production.

Concerns

Ethical and security concerns were raised over the ability to alter an audio recording to include words and phrases the original speaker never spoke, and the potential risk to voiceprint

biometrics Biometrics are body measurements and calculations related to human characteristics. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify in ...

. Concerns also rose that it may be used in conjunction with: *

Human image synthesis Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery ha ...

, which has reached such levels of likeness since the early

2000s File:2000s decade montage3.png, From top left, clockwise: The World Trade Center on fire and the Statue of Liberty during the 9/11 attacks in 2001; the euro enters into European currency in 2002; a statue of Saddam Hussein being toppled durin ...

that distinguishing between a human recorded with a camera and a simulation of a human is very difficult. *

Video manipulation Video manipulation is a type of media manipulation that targets digital video using video processing and video editing techniques. The applications of these methods range from educational videos to videos aimed at (Crowd manipulation, mass) manipu ...

of a person's

facial expressions A facial expression is one or more motions or positions of the muscles beneath the skin of the face. According to one set of controversial theories, these movements convey the emotional state of an individual to observers. Facial expressions are a ...

near real-time Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrai ...

using an existing 2D

RGB The RGB color model is an additive color model in which the red, green and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three addi ...

video of them.

Alternatives

Adobe's lack of publicized progress opened opportunities for other projects to build alternative products to VOCO, such a
Resemble AI
and 15.ai, a real-time text-to-speech tool using artificial intelligence.

WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...

is a similar but

open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

research project at London-based artificial intelligence firm

DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...

, developed independently around the same time as Adobe Voco.

References

Adobe software Speech synthesis {{Simulation-software-stub

Technical details

Concerns

Alternatives

See also

References