COYO-700M: Image-Text Pair Dataset

Published on: 2022-11-14
Contributors: Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, Saehoon Kim, Kakao Brain Large-Scale AI Studio
Datarows: ~700M image text pairs

COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. The dataset follows a similar strategy to previous vision-and-language datasets, collecting many informative pairs of alt-text and its associated image in HTML documents. Learn more about the dataset here.

CC BY 4.0 (see more)