Stanford Alpaca 7B Training Dataset

Published on: 2023-03-13
Contributors: Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto
Datarows: 52,000 datarows
Large Language Models
Generative AI

Alpaca 7B is a model fine-tuned from the Meta's LLaMA 7B model on 52K instruction-following demonstrations. On the preliminary evaluation of single-turn instruction following, Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, but is smaller and easier to reproduce. The model was created and published by a group of Stanford PhD students.

This dataset contains the 52K instruction-following samples, generated in the style of self-instruct using text-davinci-003, used to train the Alpaca 7B model.

Apache License 2.0 (see more)