Google Cloud partners with Labelbox to offer LLM human evaluation services

As teams building generative AI applications transition from prototypes to production, evaluating the performance of large language models (LLMs) is becoming critical to their success. State-of-the-art techniques for evaluating LLMs and compound AI systems, like RAG, typically employ a hybrid strategy of automated and human evaluation. While optimizing LLMs for human preference judgment can improve their performance, human evaluation remains one of the most time-consuming and resource intensive parts of the process.

To enable teams to evaluate and ship LLM applications confidently, Labelbox has partnered with Google Cloud to provide Vertex AI platform customers an integrated solution for LLM evaluation as a fully managed service.

Vertex AI LLM Evaluation

With this LLM Evaluation solution, Vertex AI customers can go directly into the Vertex AI interface to launch an LLM evaluation job, set their desired evaluation type (e.g., single model or side-by-side comparison) and criteria (e.g, question-answer, multi-turn chat, summarization), and get quality reviewed results within days from skilled evaluation professionals.

The LLM evaluation solution from Labelbox provides teams with easy access to human raters who will help evaluate the effectiveness of their organization’s LLMs against a wide range of customizable criteria - from instruction following, verbosity, to relevance of any given response.

With integrated APIs customers can simply configure their task within the Vertex AI platform and everything else is taken care of by Labelbox before the QA process. Seamless visualization of the labeling team’s responses within the Vertex AI platform also gives customers the ability to review and accept outputs, putting you in full control of the annotation quality.

A full suite of Labelbox products now available on the Google Cloud Marketplace

For teams looking to get the best of both worlds and combine a hybrid approach of AI-assistance with human evaluation, Google Cloud customers can now purchase a full suite of Labelbox products on the Google Cloud Marketplace. With native no-code integrations with Google Cloud’s BigQuery, CloudSQL and Google Sheets, customers can integrate data pipelines with Labelbox in minutes.

With this offering, Labelbox provides a data-centric AI platform providing data curation, AI-assisted labeling, premium data labeling services, and model diagnostics to align task-specific models and build intelligent applications. The latest updates to Labelbox’s products include model distillation, reinforcement learning with human feedback (RLHF) and LLM evaluation.

How to get started

As LLMs continue to power a broad array of everyday applications, they will continue to require nuanced supervision from humans to detect and mitigate errors, inconsistencies, or biases. The partnership between Google Cloud and Labelbox enables Vertex AI customers to receive a critical solution for enhancing how LLM products are built - by more easily injecting human evaluation and AI assistance directly into the process. With this technology, all the heavy lifting and manual effort is done for you, freeing up your organization’s resources to focus on building and delivering AI products.

To learn more about the LLM evaluation solution, contact us and get early access here.

Google Cloud partners with Labelbox to offer LLM human evaluation services

Continue reading

Try Labelbox today