Aya Dataset

Contributors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran and more
Datarows: 202,363 Datarows

The Aya Dataset is a comprehensive multilingual instruction fine-tuning dataset developed by the Aya Open Science Initiative, facilitated through the Aya Annotation Platform by Cohere For AI. The dataset contains over 204,000 human-annotated prompt-completion pairs across 101 languages. This dataset is designed for training, fine-tuning, and evaluating multilingual Large Language Models (LLMs).

Apache 2.0