BLIP2 (OPT 6.7B)
Text generation
BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering. This model is the BLIP-2, OPT 6.7B variant
Intended Use
BLIP2 is a visual language model (VLM) that can perform multi-modal tasks such as image captioning and visual question answering.
Performance
BLIP2 is fine-tuned on image-text datasets (e.g. LAION ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
Citation
Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
Privacy policy
Labelbox