logo

BLIP2 (Flan-T5 XL COCO)

Text generation

BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering.


Intended Use

BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering.


Performance

BLIP-2 ViT-g OPT2.7B has a score of 52.3 on VQAv2 dataset


Citation

Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.

Privacy policy

Labelbox