Introducing Recursion: the RL platform for enterprise specialist agents

Blog

Insights on AI research, reinforcement learning, evaluations, and enterprise AI systems

Latest Applied research Releases Announcements Use cases Engineering

Introducing Recursion: The RL platform for enterprise specialist agents

Recursion is a unified reinforcement learning platform for developing, evaluating, and deploying specialist AI models that improve from real enterprise execution.

Labelbox•June 24, 2026

Do AI models want to be watched? Measuring monitorability disposition in large reasoning models

Models rarely flag their own misbehavior, and when they do, they pick the most lenient monitor available. We introduce monitorability disposition: a model's willingness to stay monitored, a property that is measurable, undertrained, and missing from alignment evaluation.

Shahriar Golchin•June 30, 2026

Where models change their minds: Identifying branchpoints for NLA training

We explore whether NLAs can surface internal patterns behind shortcut behavior in LLMs using branchpoint analysis. Our findings: signals are weak and distributed, with feedback and surface features strongly shaping behavior, suggesting useful directions for future interpretability work.

Almas Abdibayev•June 15, 2026

When benchmarks saturate, what comes next? Meta’s GIM pushes AI evaluation toward integrated reasoning

Meta Superintelligence Labs introduces GIM (Grounded Integration Measure), a benchmark shifting from isolated recall to integrated reasoning. It evaluates how models coordinate constraints, ambiguity, spatial logic, and epistemic judgment within a single problem.

Labelbox•May 20, 2026

Engineering trust in an autonomous world

Security is not a checklist, it is constantly evolving. As supply chain attacks grow, one compromised dependency can ripple across ecosystems. We are sharing best practices to contain risk, respond quickly, and design systems that limit blast radius by default.

Labelbox•April 10, 2026

Introducing EchoChain: An audio benchmark for reasoning under pressure in full-duplex dialogue

We introduce EchoChain to advance audio evaluation by testing Dual-Stream Reasoning in scenario-driven conversations with mid-speech interruptions, constraint updates, and shifting objectives. The benchmark measures whether models sustain coherent, adaptive intelligence in real time.

Smit Nautambhai Modi•March 4, 2026

The AI safety illusion: why current safety datasets fool us on model safety

AI safety is often judged by refusal rates, but our study of datasets like AdvBench and HarmBench shows these scores rely on obvious trigger words, not real adversarial intent. Remove the cues and the supposed safety collapses, revealing a stark gap between benchmarks and real world risk.

Shahriar Golchin•February 20, 2026

Welcoming Upcraft to the Labelbox team

We've acquired Upcraft to bring AI agent technology into Alignerr, scaling how elite domain experts train, evaluate, and improve the world’s most advanced AI models.

Labelbox•February 10, 2026

From dormant codebases to lasting value

When startups shut down, their production code doesn’t lose value, it loses a home. Labelbox helps founders safely convert that dormant, real-world engineering work into value through a one-time, secure, founder-friendly transaction with no ongoing risk or obligations.

Labelbox•January 20, 2026

Reflections on NeurIPS 2025: Advancing evaluation and continual learning in AI

Takeaways on the themes and research directions likely to shape the year ahead. We focus on two core areas how to rigorously measure AI capabilities and how to build interactive systems that learn through experience over time.

Labelbox•December 16, 2025

Implicit Intelligence and Agent‑as‑a-World: Evaluating agents on what users don’t say

Most real-world tasks are underspecified. We introduce Implicit Intelligence to test whether agents can infer hidden constraints, and Agent-as-a-World, a simple YAML framework for simulating environments without brittle, hard-coded worlds.

Ved Sirdeshmukh•November 21, 2025

Page 1 of 10>

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free