LabelboxJune 18, 2024

Powerful multimodal support delivers high-quality data for GenAI models

Today, we are announcing the new Labelbox multimodal chat solution—a powerful combination of advanced tooling and managed services to help GenAI model builders and digital-native enterprises create, evaluate and optimize model responses across the most common data modalities.

After working with hundreds of companies at the forefront of developing frontier models and building task-specific generative AI products, it is clear to us that data quality remains king. Access to high-quality, human-evaluated data is determining which companies differentiate their AI offerings versus those who don’t. 

The number of accessible, highly performant models (both proprietary and open source) continues to grow. As a result, companies positioned at the forefront of generative AI have two critical requirements:

  • Frontier model builders need access to managed labeling services that can quickly generate powerful, multimodal data sets for RLHF and model tuning from a large network of highly-skilled and specialized subject matter experts. 
  • Digital-native enterprises with task-specific AI initiatives depend on advanced tooling to compare and evaluate their custom models alongside the latest multimodal models in a live, multi-turn environment. They also need to be able to orchestrate advanced workflows, leverage AI-assistance to accelerate labeling, and collaborate within a single, open platform. 

That's exactly the problem that Labelbox solves with the new multimodal chat solution that is supported by our new highly-skilled Boost Workforce experts. The new solution makes it easier for teams to iterate quickly with real-time, granular visibility into labels and data quality, while being able to tap into diverse pools of expertise in order to improve the underlying data and model performance.

Introducing Labelbox multimodal chat solution

The new Labelbox multimodal chat solution allows you to select up to 10 different models to have live, multi-turn conversations about a wide range of data modalities. Supported data types include text, images, videos, audio and documents (PDFs). Explore it in under a minute in our new Product Tours page.

Subject matter expert (SMEs) and labelers work in an intuitive user interface to rate, rank or classify the model responses, delivering important annotations that are critical for Reinforcement Learning with Human Feedback (RLHF), Supervised Fine Tuning (SFT), red teaming, and more.

For organizations, like AI labs and model builders, that need to expand their datasets with specific expertise, the Labelbox Boost Workforce is a managed service offering on-demand access to a diverse set of high-skilled and educated experts. Expertise includes multiple languages and spans subjects such as coding, law, biomedicine, nutrition, nuclear physics, psychology, education, and more.

Evaluate responses from multiple models and different data modalities all in one platform.

Key capabilities available today in the Labelbox multimodal chat solution include:

  • Multimodal data support: Along with text inputs, upload image, video, audio, or document (PDF) files to evaluate the models support for different data modalities. Refer to Labelbox docs for details on what attachment types are supported with which model today. We are continually expanding the support so let us know what is important to you. 
  • Multi-turn evaluation: Converse with up to 10 different models simultaneously in a multi-turn dialog enabling users to perform evaluations that go well beyond a single back-and-forth response.
  • Industry-leading Foundry models: Choose from some of the most advanced models including Google Gemini 1.5 Pro, OpenAI GPT-4o, Claude 3 Sonnet, Llama 3 70B, and more. 
  • Custom model support: Evaluate custom models and release candidates alongside standard models to evaluate responses and measure performance.      
  • Advanced annotation options: Use a variety of annotation types, including support for the following: message ranking, message selection, radio classification (global or message-based), checklist classification (global or message-based), and free text classification.
  • LLM data generation: Generate high-quality, single-response outputs for multi-turn or single-turn multimodal prompts to help train and improve model responses with SMEs. 
  • Extensibility: Use built-in SDKs and advanced export capabilities for seamless workflows and integration with broader AI/ML projects.
  • Quality control and analytics: Access built-in performance analytics to gain invaluable project overviews, view ranking graphs, measure model variance, track labeler performance and efficiency, and more.
Detailed performance metrics available including model variance histograms across the models tested

Accelerate model building and RLHF 

For large AI labs building foundation models, the new multimodal chat solution combines a powerful tool for creating and reviewing multimodal training data and a network of SMEs to create and enhance the model responses. Key projects and use cases that depend on high-quality data include: 

  • Evaluating new model release candidates: Pit multiple versions of a model against each other in live, multi-turn conversations to evaluate current production models versus new release candidates. 
  • Refining LLMs through RLHF: Generate high-quality, ranked data that fuels Reinforcement Learning with Human Feedback (RLHF) systems. By comparing and ranking model responses, you can create the essential preference data needed to train an accurate reward model for RLHF.
  • Accessing managed service of subject matter experts: Tap into a large network of highly skilled, top-quality talent to produce the highest-quality data. Use the Labelbox Boost Workforce for rapid access to subject matter experts. 
  • Delivering transparency and control: Orchestrate and monitor every step of the process, maintaining complete control over the labeling steps with detailed analytics and custom review processes.

Optimize task-specific GenAI solutions

For digital-native organizations and AI-leading enterprises creating task-specific AI solutions using existing LLMs, they need a flexible, open platform that integrates with their AI, ML and data operations teams. While they are not building foundation models like the above groups, they have similar needs and Labelbox plays a critical role in helping them with the following tasks: 

  • Evaluating foundation models for enterprise apps: Pit multiple models (up to 10) against each other in live, multi-turn conversations. Rank, rate, and classify outputs for a quantitative, data-driven evaluation of models, ensuring that the model selected for a given enterprise application delivers the best end user experience. 
  • Optimizing RAG systems: Evaluate multiple versions of custom models built on industry-standard LLMs and proprietary company data to select the best model and identify areas that need further training with new or enhanced data. 
  • Generating high-quality responses for supervised fine-tuning (SFT): Use SMEs to classify or rank model output. If the responses do not meet quality requirements, SMEs can write higher-quality responses to be used for future training.   
  • Delivering transparency and control: Like the previous group, it's also important for these teams to orchestrate and monitor every step of the process while taking advantage of built-in automation and advanced orchestration.

Evaluate multimodal chat data today for free

Labelbox’s holistic multimodal chat solution unlocks a new era of GenAI development with powerful tooling and expert managed services. With differentiated data, streamlined operations, and transparent orchestration, you can build and refine foundation models that meet your specific needs.

Ready to deliver on the next-generation of generative AI? Sign up for a free Labelbox account to try it out or contact us to learn more and we’d love to hear from you.