Google Imagen
Google's Imagen generates a relevant description or caption for a given image.
Intended Use
The model generates image captioning, allowing users to generate a relevant description for an image. You can use this information for a variety of use cases:
Creators can generate captions for uploaded images
Generate captions to describe products
Integrate Imagen captioning with an app using the API to create new experiences
Imagen currently supports five languages: English, German, French, Spanish and Italian.
Performance
The Imagen model has reported to achieve high accuracy, however may have limitations in generating captions for complex or abstract images. The model may also generate captions that reflect biases present in the training data.
Citations
Google Image captioning documentation Google visual question answering documentation