Imagen is a series of text-to-image models developed by

Google DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Goo ...

. They were developed by

Google Brain Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence ...

until the company's merger with DeepMind in April 2023. Imagen is primarily used to generate images from text prompts, similar to

Stability AI Stability AI Ltd is a UK-based artificial intelligence company, best known for its text-to-image model Stable Diffusion. History and founding Stability AI was founded in 2019 by Emad Mostaque and by Cyrus Hodes. In August 2022 Stability AI r ...

Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

's DALL-E, or Midjourney. The original version of the model was first discussed in a paper from May 2022. The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI.

History

Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language. The second version, Imagen 2 was released in December 2023. The standout feature was text and logo generation. Imagen 3 was released in August 2024. Google claims that the newest version provides better detail and lighting on generated images. On 20 May 2025 at

Google I/O Google I/O, or simply I/O, is an annual developer conference held by Google in Mountain View, California. The name "I/O" is taken from the number googol, with the "I" representing the first digit "1" in a googol and the "O" representing the s ...

2025 the company released an improved model, Imagen 4.

Technology

Imagen uses two key technologies. The first is the use of

transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...

-based

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

s, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. It generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.

Capabilities

Imagen can generate photorealistic images from text prompts. It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.

References

External links

Imagen website
{{Artificial intelligence navbox Text-to-image generation Artificial intelligence art Deep learning software applications 2022 software Google DeepMind

History

Technology

Capabilities

See also

References

External links