Aya Dataset

Contributors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran and more

Datarows: 202,363 Datarows

Text

The Aya Dataset is a comprehensive multilingual instruction fine-tuning dataset developed by the Aya Open Science Initiative, facilitated through the Aya Annotation Platform by Cohere For AI. The dataset contains over 204,000 human-annotated prompt-completion pairs across 101 languages. This dataset is designed for training, fine-tuning, and evaluating multilingual Large Language Models (LLMs).

License
Apache 2.0

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free