logo
×

LabelboxMarch 26, 2025

New AI models in Labelbox: Nova Pro, Gemini 2.0, Claude 3.7, Whisper, & more

In today’s fast-evolving AI landscape, the development of advanced models is transforming industries, solving complex problems, and enhancing productivity. We recently added support for six of the latest frontier models to Labelbox. The newly supported models, which play a key role in enriching your datasets and automating critical data tasks, include OpenAI Whisper, Google Gemini 2.0 Pro, Google Gemini 2.0 Flash, Claude 3.7 Sonnet, Amazon Nova Pro, and OpenAI o3-mini.

Enhance your development with built-in AI capabilities

The new models, which are part of our Model Foundry capabilities in the Labelbox Platform, are used for a number of common tasks and offer our users the chance to evaluate and test new AI models and builds against the state-of-the-art models on the market. 

These models are available to help with the following:

  • Predict (infer) labels from your data
  • Compare the performance of different foundational models through our live, multi-turn chat arena capabilities
  • Prototype, diagnose, and refine a machine learning app to solve specific business needs
  • Generate preference data to fine-tune large language models for RLHF
  • Evaluate multimodal LLMs

Meet the six newly added frontier models

1. Amazon Nova Pro

Amazon Nova Pro is a multimodal model built to handle a wide range of tasks, from video summarization to Q&A and complex mathematical reasoning. It excels at processing text, images, documents, and video, offering fast inference for real-time applications like customer service and content creation.

Amazon Nova Pro is known for its speed and flexibility, particularly in multimodal tasks and real-time applications. It performs exceptionally well in tasks requiring deep reasoning, multi-step problem-solving, and AI agents capable of executing workflows. The model is designed with high accuracy across diverse multimedia and complex tasks.

While highly capable, Amazon Nova Pro is resource-intensive and might not always specialize in niche or highly technical areas. Fine-tuning for specific tasks may also require significant effort and expertise.

2. Claude 3.7 Sonnet

Claude 3.7 Sonnet, developed by Anthropic, is a powerful model designed for a blend of quick responses and deep reasoning. It excels in coding, web development, problem-solving, and large-scale refactoring. The model offers advanced features for complex tasks, such as bug fixing and feature development.

Claude 3.7 Sonnet provides a mix of fast, near-instantaneous responses and slower, more thoughtful responses for deep analysis. It shows remarkable strength in coding tasks, particularly with full-stack updates and production-ready code. The introduction of Claude Code, an agentic tool for managing code repositories and running tests, significantly reduces development time.

However, Claude 3.7 Sonnet is not without limitations. It struggles with maintaining context in extended conversations and can reflect biases inherent in its training data. Additionally, while its coding capabilities are strong, its creativity in content generation may not always meet highly specific or novel standards.

3. Google Gemini 2.0 Flash

Google Gemini 2.0 Flash is optimized for handling high-volume, high-frequency tasks, excelling in multimodal reasoning with a context window of 1 million tokens. The model outperforms its predecessor, Gemini 1.5 Pro, by delivering results at twice the speed. It boasts enhanced multimodal capabilities and excels at tasks like coding, function calling, and complex instruction following. This makes it ideal for real-time applications in industries such as customer service, video generation, and multimedia content creation.

While fast and capable, Gemini 2.0 Flash shares similar limitations with its sibling model, Gemini 2.0 Pro. It may not always maintain context in lengthy interactions and could reflect biases in its responses. Additionally, like Gemini 2.0 Pro, it also struggles with highly specialized technical content.

4. Google Gemini 2.0 Pro

Google Gemini 2.0 Pro is a multimodal model designed for a variety of tasks, with its most notable strengths in coding and handling complex prompts. Gemini 2.0 Pro supports multimodal input and can call external tools like Google Search, execute code, and count tokens. Its 2M token context window makes it highly capable of managing large amounts of information.

Google’s Gemini 2.0 Pro is a leader in coding tasks and world knowledge, offering an unrivaled ability to understand and reason through complex instructions. The large context window allows it to process long documents or code, making it perfect for developers and knowledge analysts. Moreover, it supports controlled generation, which ensures that output can be tailored to specific needs.

While Gemini 2.0 Pro has many advanced capabilities, it may struggle with maintaining context over extended interactions, leading to inconsistencies. Additionally, as with any model trained on large internet data, it can reflect inherent biases. The model might not fully understand or accurately interpret highly technical or domain-specific content, especially if it involves recent developments post-training data cutoff.

5. OpenAI o3-mini

OpenAI o3-mini is built for fast and efficient reasoning in technical domains like science, math, and coding. It’s optimized for STEM problem-solving, offering precise answers with improved speed compared to its predecessor, o1-mini. The model is perfect for tasks that require both speed and accuracy, such as logical problem-solving and complex technical inquiries.

OpenAI o3-mini delivers exceptional performance in STEM tasks, especially in science, math, and coding. Compared to o1-mini, it provides faster responses, with an average response time 24% quicker than o1-mini (7.7 seconds vs. 10.16 seconds). It also reduces major errors by 39%, making it a top choice for technical queries. Expert testers favored its responses 56% more often over o1-mini.

While o3-mini excels in STEM tasks, it has no vision capabilities, making it unsuitable for image-related tasks. Additionally, for highly complex reasoning tasks, it may still lag behind larger models. While it performs well in technical domains, it may not match specialized models in very niche areas. Fine-tuning may also be required for optimal results in more specific use cases.

6. OpenAI Whisper

Whisper is OpenAI’s automatic speech recognition (ASR) system, built to transcribe and translate speech from multiple languages into English. It was trained on a massive dataset of 680,000 hours of multilingual audio, making it highly effective at handling accents, noise, and technical language.

Whisper offers near state-of-the-art accuracy in speech recognition and translation across about 10 languages. Its robust design allows it to perform exceptionally well for English speech recognition and multilingual transcription. However, its performance tends to drop in low-resource or low-discoverability languages and across different accents.

Despite its impressive capabilities, Whisper does have a few limitations. Due to being trained using weak supervision it can occasionallytends to hallucinate occasionally. This can be more pronounced in languages or dialects with less data available. Additionally, the sequence-to-sequence architecture of the model sometimes results in repetitive text.

Final thoughts on the new models added to the Labelbox Platform

Each of these models—OpenAI Whisper, Gemini 2.0 Pro, Gemini 2.0 Flash, Claude 3.7 Sonnet, Amazon Nova Pro, and OpenAI o3-mini—represents the cutting edge of AI innovation, with unique strengths tailored to different applications. 

Whether you're looking for the best speech recognition tool, the most advanced multimodal model, or a high-speed solution for coding and complex reasoning, these models are pushing the boundaries of what AI can accomplish in various industries. 

If you want to know how Labelbox enables teams to use world-class foundation models to enrich datasets and automate tasks, contact us.