COYO-700M: Image-Text Pair Dataset

Published on: 2022-11-14

Contributors: Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, Saehoon Kim, Kakao Brain Large-Scale AI Studio

Datarows: ~700M image text pairs

images

image-text-pairs

Explore dataset

COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. The dataset follows a similar strategy to previous vision-and-language datasets, collecting many informative pairs of alt-text and its associated image in HTML documents. Learn more about the dataset here.

License
CC BY 4.0 (see more)

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free