Introducing Recursion: the RL platform for enterprise specialist agents

Read the announcement

Blog

Insights on AI research, reinforcement learning, evaluations, and enterprise AI systems

Latest Applied research Releases Announcements Use cases Engineering

Benchmarking agentic search

Benchmarking agentic search

Enterprises need search-augmented LLMs that deliver fast, trustworthy, and up-to-date answers—not just polished language. Since public benchmarks rarely test for this, the Labelbox research team conducted its own study across three frontier models: Gemini 2.5 Pro, GPT-4.1, and Claude 4.0 Opus.

Labelbox•June 13, 2025

<Page 2 of 2