Google Imagen

Text generation

Google's Imagen generates a relevant description or caption for a given image.

Intended Use

The model generates image captioning, allowing users to generate a relevant description for an image. You can use this information for a variety of use cases:

  • Creators can generate captions for uploaded images

  • Generate captions to describe products

  • Integrate Imagen captioning with an app using the API to create new experiences

Imagen currently supports five languages: English, German, French, Spanish and Italian.


The Imagen model has reported to achieve high accuracy, however may have limitations in generating captions for complex or abstract images. The model may also generate captions that reflect biases present in the training data.


Google Image captioning documentation Google visual question answering documentation

