The RL data engine for AI teams
From environments to custom evaluations, we partner with over 90% of leading AI labs in the US and the innovators defining the next frontier of AI.
For frontier AI
The data, environments, and evaluation infrastructure the world's frontier AI labs build on.
For enterprises
Build it. Evaluate it. Deploy it.
Agent Studio is the platform for building, evaluating, and continuously improving specialist AI agents on real enterprise workflows at a fraction of frontier AI token costs.
Hundreds of AI teams build with Labelbox

How Meta built GIM with Labelbox data to evaluate frontier AI reasoning
Problem
Meta needed a benchmark that remained discriminative as existing LLM evaluations saturated. The team wanted tasks grounded in practical reasoning rather than obscure knowledge or synthetic puzzles, with enough rubric detail to capture partial credit and enough quality control to support a public-private contamination diagnostic.
Solution
Labelbox produced the data foundation for GIM: 820 expert-authored problems across seven cognitive categories, including 229 multimodal items and 528 rubric-graded prompts. The work included original prompt creation, structured scoring criteria, review, annotation, and quality assurance, enabling Meta to calibrate a 2PL IRT model over more than 200,000 prompt-response pairs.
Result
Meta released GIM-615, calibrated item parameters, and an evaluation framework that benchmarked 22 models across 47 reporting configurations. The paper found GIM remains far from saturated, with roughly 20% of items above frontier ability, giving researchers a durable way to compare model capability, thinking budgets, and future systems.

Human preference signal for evaluating LLMs inside Vertex AI


Tracking surgical instruments in video to advance robotic surgery

Higher-quality training signal for personalized shopping AI
Latest work from Labelbox Research
Labelbox's world-class applied research team pioneers frontier AI data generation and evaluation methods. Through scientific precision and co-innovation, we help customers achieve real-time AGI breakthroughs.