Text-to-Video is a state of the art
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
technology which needs only text as input for the output as video. The inspiration came from
text-to-image model
A text-to-image model is a machine learning model which takes as input a natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural netwo ...
s which deliver images as output from text as input.
Video prediction on making objects realistic in a stable background is performed by using
recurrent neural network
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
for a sequence to sequence model with a connector
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
encoding and decoding each frame pixel by pixel, creating video using
deep learning
Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
De ...
.
Methodology
* Data collection and data set preparation using clear video from kinetic human action video.
* Training the
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
for making video.
* Keywords extraction from text using
natural-language programming
Natural-language programming (NLP) is an ontology-assisted way of programming in terms of natural-language sentences, e.g. English. A structured document with Content, sections and subsections for explanations of sentences forms a NLP documen ...
.
* Testing of Data set in conditional generative model for existing static and dynamic information from text by
variational autoencoder
In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods.
...
and
generative adversarial network
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
.
Models
There are different models including
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
models. CogVideo presented their code in
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
.
Meta Platforms
Meta Platforms, Inc., (file no. 3835815) trade name, doing business as Meta and formerly named Facebook, Inc., and TheFacebook, Inc., is an American multinational technology conglomerate based in Menlo Park, California. The company owns Facebo ...
uses text-to-video wit
makeavideo.studioGoogle
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
used Imagen Video for converting text-to-video.
Antonia Antonova presented another model.
References
{{Reflist
Artificial intelligence
Algorithms
Language
Computers