Labelbox•April 7, 2025

Q1 spotlight: Accelerating AI development with new products and services

What we’ve been up to

2025 is flying by as we provide more and more AI teams with high-quality data—through a combination of our growing Alignerr services and innovations to our Labelbox Platform. We introduced powerful new platform capabilities that support the next generation of frontier AI models, launched easier ways to connect with expert talent, and expanded our benchmark leaderboards to offer unparalleled insights into the rapidly evolving AI landscape.

Based on the latest needs from leading AI labs, we delivered high-quality training data focused on complex reasoning, multilingual translation, advanced coding tasks, multimodal mathematical reasoning, and more.

Read on for a recap of all that’s new from Labelbox in the past quarter. We’ll break down three major areas of focus for us from Q1:

Expanded Labelbox Leaderboards: Discover our Multimodal Reasoning leaderboard and explore refreshed image, video and audio rankings from the latest models
Launched Alignerr Connect: Browse and connect directly with expert AI trainers across advanced domains to support your model training workflows
Supported emerging AI use cases: Learn how the Labelbox Platform expanded to support the hottest focus areas including coding, multimodal reasoning, chain-of-thought (CoT) responses, and agent trajectories

Labelbox Leaderboards expanded: Introducing multimodal reasoning and the latest models

Understanding how different foundation models perform on real-world tasks is crucial for AI development. This quarter, we significantly expanded the Labelbox Leaderboards, reinforcing our commitment to providing transparent, human-preference-driven benchmarks.

The most exciting addition is the launch of our new Multimodal Reasoning Leaderboard. This leaderboard evaluates cutting-edge models on their ability to mimic human-like understanding across combined modalities, assessing capabilities like logical storytelling, visual difference detection, image captioning, and spatial reasoning.

Explore the the new Multimodal Reasoning Leaderboard to see how models compare on storytelling, image captioning, and more

Alongside the new leaderboard, we've refreshed our existing leaderboards for image, speech, and video. These updates incorporate the latest models from leading AI labs, including Amazon Nova Pro, Claude 3.7, Imagen 3, OpenAI o3-mini, and Whisper.

We adopted a refined Elo comparison methodology based on direct pairwise evaluations to deliver in better ratings of each model. This ensures our rankings accurately reflect real-world capabilities and human preferences, providing a reliable benchmark for the AI community.

Stay tuned as we’ll soon be further expanding our leaderboards with a new, complex reasoning leaderboard!

Alignerr Connect launches: Browse and hire proven AI experts

AI labs with a proven, internal process based on custom software and tooling often need to complement their full-time teams with additional experts or unique skills on a project-by-project basis. However, finding highly skilled talent with specific domain expertise is often a bottleneck in AI development. To address this, we greatly expanded our Labelbox Alignerr Connect offering this quarter.

Use advanced filters to browse and request specific Alignerrs to join your most critical in-house projects

Alignerr Connect provides companies with direct access to Alignerrs—our rigorously vetted community of professionals specializing in AI model evaluation, data labeling, and data generation across diverse domains. This offering complements our existing platform and managed Labeling Services, providing maximum flexibility to meet your team’s needs.

Whether you need to rapidly expand your team, access niche frontier intelligence, or maintain direct control over specialized projects, Alignerr Connect allows you to discover and hire proven AI trainers directly. This expanded offering positions Labelbox as the only vendor offering a comprehensive suite of solutions to build, operate, or staff your AI data factory according to your specific needs.

Advance AI to new frontiers: Innovative platform enhancements

Over the past three months, we’ve released key features that tackle the complexities of training and evaluating sophisticated AI models. These new features focus on generating high-quality training data in areas like audio, complex reasoning, multimodal reasoning, agent trajectories, and coding.

Here’s a quick recap of our most popular updates from Q1:

Advanced LLM reasoning: To help teams build more accurate and trustworthy LLMs, we've introduced advanced fact-checking and prompt rating tools within our multimodal chat editor. These features allow for granular quality control, enabling raters to assess the veracity of multi-step reasoning responses and evaluate the quality of prompts themselves.

Users review each step of the model’s response, marketing them as either “Accurate”, “Inaccurate” or “Disputed.” Additional details are entered if the step is not accurate

AI agent training and evaluation: We've added agent-specific capabilities to our Multimodal Chat Editor to streamline the development of AI agents. Users can now create, edit, annotate, and evaluate agent trajectories (sequences of reasoning steps, tool calls, and observations), facilitating both efficient training data creation and robust evaluation workflows.

Use message-level classifications such as “planning error” and “tool call error” to improve the agent trajectories

Integrated VS Code IDE environment: Recognizing the critical need for high-quality coding data, we released an integrated Visual Studio Code (VS Code) Web IDE environment within our multimodal chat editor. This provides a familiar, desktop-class coding experience, complete with debugging and extension support, directly within Labelbox, accelerating the generation of sophisticated coding data.

The new integrated VS Code environment offers full access to extensions, Github Copilot, multi-file programs, and more

Enhanced data modalities: Our January platform update also included powerful new tools like AI-powered audio transcription using Whisper and a ChatGPT-powered OCR engine for efficient text extraction from documents, further streamlining data preparation across modalities.
Industry-Specific Data Generation: We continue to enhance our platform to support the creation of targeted, industry-specific datasets essential for building AI with deep domain understanding in fields like finance, law, and medicine, often leveraging expert Alignerrs sourced via our services or Alignerr Connect. Read through our ultimate guide to understanding and creating industry-specific data.

Customer story highlights: Real-world impact

These advancements translate directly into customer success. This quarter, we highlighted just a few ways AI labs are combining the power of Labelbox software with expert human insight:

AI Lab improves multimodal STEM reasoning with expert data: Labelbox assembled a team of STEM PhDs and Masters who generated unique, complex multimodal training data, enabling a top AI lab to pinpoint weaknesses and enhance their LLM's domain-specific reasoning capabilities.
AI audio startup achieves realistic models with expert labeling: Facing difficulties in labeling subjective audio nuances like emotion and speech style, a generative AI audio startup utilized Labelbox's specialized trainers (with fine arts backgrounds) and advanced audio editor. This partnership delivered high-quality, precisely segmented data with descriptive commands, significantly improving the realism and adoption of their cutting-edge audio models.
Legal AI startup enhances agent with expert data: A generative AI startup partnered with Labelbox Services to access legal experts and high-quality training data, significantly improving their AI agent's legal document processing capabilities and accelerating development.

These stories underscore how Labelbox serves as a critical partner in developing sophisticated AI applications across diverse industries and use cases.

Engineering insights

Labelbox engineers regularly share knowledge and insights for others on our Labelbox blog based on our own experiences and learnings. These posts share technical insights or reveal techniques on how our team optimized for code efficiency and streamlined performance.

Conclusion: Building the future of AI, together

The first quarter of 2025 brought significant leaps forward in Labelbox's capabilities, driven by our commitment to helping improve the world’s most advanced AI models and applications. From enhanced platform features supporting the latest AI frontiers to new ways of accessing expert talent and deeper insights through our Leaderboards, we are dedicated to providing the tools and resources you need to succeed.

We're excited about the progress made and look forward to continuing this momentum. Stay tuned for more innovations in the coming months!

Ready to explore these new features or discuss how Labelbox can accelerate your AI initiatives? Contact our sales team or explore our documentation to learn more.

Continue reading

Welcoming Upcraft to the Labelbox team

We've acquired Upcraft to bring AI agent technology into Alignerr, scaling how elite domain experts train, evaluate, and improve the world’s most advanced AI models.

Labelbox•February 10, 2026

Announcing R-ConstraintBench: A novel way to stress-test LLM reasoning abilities under interacting constraints

We've released a research paper on R-ConstraintBench, a novel benchmark for evaluating LLM reasoning on realistic resource-constrained project scheduling problems (RCPSP), a well-known NP-complete challenge.

Labelbox•August 22, 2025

Introducing Labelbox Evaluation Studio: Drive AGI advancements with real-time feedback on model performance

Labelbox Evaluation Studio unlocks a private, real-time platform where top AI teams unlock tailored insights, instantly spot strengths and weaknesses, and accelerate faster frontier model improvements.

Labelbox•August 5, 2025

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free