- What we’ve been up to
- Labelbox Leaderboards expanded: Introducing multimodal reasoning and the latest models
- Alignerr Connect launches: Browse and hire proven AI experts
- Advance AI to new frontiers: Innovative platform enhancements
- Customer story highlights: Real-world impact
- Engineering insights
- Conclusion: Building the future of AI, together
Michael Haag•April 7, 2025
Q1 spotlight: Accelerating AI development with new products and services

What we’ve been up to
2025 is flying by as we provide more and more AI teams with high-quality data—through a combination of our growing Alignerr services and innovations to our Labelbox Platform. We introduced powerful new platform capabilities that support the next generation of frontier AI models, launched easier ways to connect with expert talent, and expanded our benchmark leaderboards to offer unparalleled insights into the rapidly evolving AI landscape.
Based on the latest needs from leading AI labs, we delivered high-quality training data focused on complex reasoning, multilingual translation, advanced coding tasks, multimodal mathematical reasoning, and more.
Read on for a recap of all that’s new from Labelbox in the past quarter. We’ll break down three major areas of focus for us from Q1:
- Expanded Labelbox Leaderboards: Discover our Multimodal Reasoning leaderboard and explore refreshed image, video and audio rankings from the latest models
- Launched Alignerr Connect: Browse and connect directly with expert AI trainers across advanced domains to support your model training workflows
- Supported emerging AI use cases: Learn how the Labelbox Platform expanded to support the hottest focus areas including coding, multimodal reasoning, chain-of-thought (CoT) responses, and agent trajectories
Labelbox Leaderboards expanded: Introducing multimodal reasoning and the latest models
Understanding how different foundation models perform on real-world tasks is crucial for AI development. This quarter, we significantly expanded the Labelbox Leaderboards, reinforcing our commitment to providing transparent, human-preference-driven benchmarks.
The most exciting addition is the launch of our new Multimodal Reasoning Leaderboard. This leaderboard evaluates cutting-edge models on their ability to mimic human-like understanding across combined modalities, assessing capabilities like logical storytelling, visual difference detection, image captioning, and spatial reasoning.

Explore the the new Multimodal Reasoning Leaderboard to see how models compare on storytelling, image captioning, and more
Alongside the new leaderboard, we've refreshed our existing leaderboards for image, speech, and video. These updates incorporate the latest models from leading AI labs, including Amazon Nova Pro, Claude 3.7, Imagen 3, OpenAI o3-mini, and Whisper.
We adopted a refined Elo comparison methodology based on direct pairwise evaluations to deliver in better ratings of each model. This ensures our rankings accurately reflect real-world capabilities and human preferences, providing a reliable benchmark for the AI community.
Stay tuned as we’ll soon be further expanding our leaderboards with a new, complex reasoning leaderboard!
Alignerr Connect launches: Browse and hire proven AI experts
AI labs with a proven, internal process based on custom software and tooling often need to complement their full-time teams with additional experts or unique skills on a project-by-project basis. However, finding highly skilled talent with specific domain expertise is often a bottleneck in AI development. To address this, we greatly expanded our Labelbox Alignerr Connect offering this quarter.

Alignerr Connect provides companies with direct access to Alignerrs—our rigorously vetted community of professionals specializing in AI model evaluation, data labeling, and data generation across diverse domains. This offering complements our existing platform and managed Labeling Services, providing maximum flexibility to meet your team’s needs.
Whether you need to rapidly expand your team, access niche frontier intelligence, or maintain direct control over specialized projects, Alignerr Connect allows you to discover and hire proven AI trainers directly. This expanded offering positions Labelbox as the only vendor offering a comprehensive suite of solutions to build, operate, or staff your AI data factory according to your specific needs.
Advance AI to new frontiers: Innovative platform enhancements
Over the past three months, we’ve released key features that tackle the complexities of training and evaluating sophisticated AI models. These new features focus on generating high-quality training data in areas like audio, complex reasoning, multimodal reasoning, agent trajectories, and coding.
Here’s a quick recap of our most popular updates from Q1:
- Advanced LLM reasoning: To help teams build more accurate and trustworthy LLMs, we've introduced advanced fact-checking and prompt rating tools within our multimodal chat editor. These features allow for granular quality control, enabling raters to assess the veracity of multi-step reasoning responses and evaluate the quality of prompts themselves.

- AI agent training and evaluation: We've added agent-specific capabilities to our Multimodal Chat Editor to streamline the development of AI agents. Users can now create, edit, annotate, and evaluate agent trajectories (sequences of reasoning steps, tool calls, and observations), facilitating both efficient training data creation and robust evaluation workflows.

- Integrated VS Code IDE environment: Recognizing the critical need for high-quality coding data, we released an integrated Visual Studio Code (VS Code) Web IDE environment within our multimodal chat editor. This provides a familiar, desktop-class coding experience, complete with debugging and extension support, directly within Labelbox, accelerating the generation of sophisticated coding data.

The new integrated VS Code environment offers full access to extensions, Github Copilot, multi-file programs, and more
- Enhanced data modalities: Our January platform update also included powerful new tools like AI-powered audio transcription using Whisper and a ChatGPT-powered OCR engine for efficient text extraction from documents, further streamlining data preparation across modalities.
- Industry-Specific Data Generation: We continue to enhance our platform to support the creation of targeted, industry-specific datasets essential for building AI with deep domain understanding in fields like finance, law, and medicine, often leveraging expert Alignerrs sourced via our services or Alignerr Connect. Read through our ultimate guide to understanding and creating industry-specific data.
Customer story highlights: Real-world impact
These advancements translate directly into customer success. This quarter, we highlighted just a few ways AI labs are combining the power of Labelbox software with expert human insight:
- AI Lab improves multimodal STEM reasoning with expert data: Labelbox assembled a team of STEM PhDs and Masters who generated unique, complex multimodal training data, enabling a top AI lab to pinpoint weaknesses and enhance their LLM's domain-specific reasoning capabilities.
- AI audio startup achieves realistic models with expert labeling: Facing difficulties in labeling subjective audio nuances like emotion and speech style, a generative AI audio startup utilized Labelbox's specialized trainers (with fine arts backgrounds) and advanced audio editor. This partnership delivered high-quality, precisely segmented data with descriptive commands, significantly improving the realism and adoption of their cutting-edge audio models.
- Legal AI startup enhances agent with expert data: A generative AI startup partnered with Labelbox Services to access legal experts and high-quality training data, significantly improving their AI agent's legal document processing capabilities and accelerating development.
These stories underscore how Labelbox serves as a critical partner in developing sophisticated AI applications across diverse industries and use cases.
Engineering insights
Labelbox engineers regularly share knowledge and insights for others on our Labelbox blog based on our own experiences and learnings. These posts share technical insights or reveal techniques on how our team optimized for code efficiency and streamlined performance.
- Code Runner: Secure, scalable code execution for model evaluation
- Bringing AI to the browser: SAM2 for interactive image segmentation
- Inside the matrix: A look into the math behind AI
Conclusion: Building the future of AI, together
The first quarter of 2025 brought significant leaps forward in Labelbox's capabilities, driven by our commitment to helping improve the world’s most advanced AI models and applications. From enhanced platform features supporting the latest AI frontiers to new ways of accessing expert talent and deeper insights through our Leaderboards, we are dedicated to providing the tools and resources you need to succeed.
We're excited about the progress made and look forward to continuing this momentum. Stay tuned for more innovations in the coming months!
Ready to explore these new features or discuss how Labelbox can accelerate your AI initiatives? Contact our sales team or explore our documentation to learn more.