LLaVA Instruct 150K

Contributors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Datarows: 157,712 Datarows

Conversational text

Foundation models

The LLaVA Visual Instruct 150K dataset comprises samples of multimodal data designed for instruction-following tasks. This dataset was generated by using the GPT-4 model to produce responses to image-based prompts sourced from the COCO dataset. It comprises three sections: conversations, detailed descriptions, and complex reasoning. The primary use of this dataset is to train and assess multimodal language models, like LLaVA, that can respond to both text and images.

Citation
https://arxiv.org/abs/2310.03744

License
Attribution-Non Commercial 4.0 International

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free