logo

BLIP2 (OPT 6.7B)

Text generation

BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering. This model is the BLIP-2, OPT 6.7B variant


Intended Use

BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering.


Performance

BLIP2 is fine-tuned on image-text datasets (e.g. LAION ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.


Citation

Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.

Privacy policy

Labelbox