The Aya Dataset is a comprehensive multilingual instruction fine-tuning dataset developed by the Aya Open Science Initiative, facilitated through the Aya Annotation Platform by Cohere For AI. The dataset contains over 204,000 human-annotated prompt-completion pairs across 101 languages. This dataset is designed for training, fine-tuning, and evaluating multilingual Large Language Models (LLMs).