×

Training Walmart's conversational AI on higher-quality language signal

Problem

Walmart wanted faster, higher-quality ways to produce the signal behind its conversational AI — annotating chatbot conversations and labeling inventory images for object detection and classification across tens of millions of diverse product SKUs.

Solution

Labelbox gave Walmart an end-to-end, in-app workflow for conversational AI, with text and image editors that tag named entity recognition (NER) relationships, plus Google BigQuery integration via the Python SDK to automate orchestration and model-assisted labeling to speed signal production.

Result

“Conversational AI is a very exciting and challenging field, because large language models are extremely sensitive to data quality,” shared Philippe Hanrigou, a Walmart director of data-science dedicated to conversational AI and chatbots. “Labelbox is a gamechanger because of the quality of the labels we can now acquire for our models. It’s hard to imagine a life now without Labelbox.”

Training Walmart's conversational AI on higher-quality language signal

Walmart's Text-to-Shop and chatbots run on conversational AI and LLMs. Labelbox's platform produces the expert-graded language signal, with full pipeline visibility, that those models are sensitive to.

Note: The quotes for this post were sourced from Walmart's Global Tech blog on our Sparkcubate partnership with Walmart.

The challenge

Walmart, a leading global retailer, wanted a better way to produce the signal behind its conversational AI and LLM-powered applications. The data science team needed faster ways to annotate conversational text from shopping chatbots and to label inventory images for object detection and classification across tens of millions of diverse product SKUs. Label the conversations well, and the models get stronger and more natural over time. Walmart had relied on tech-enabled BPOs, but the process was a black box: little visibility into signal quality, no individual or project-level analytics, and no software for in-house experts — data scientists, ML engineers, linguists — to collaborate with external providers.

The approach

Labelbox gave Walmart an end-to-end, in-app workflow built for conversational AI. The interface was optimized for conversations, with consistency and quality control, and text and image editors that tag named entity recognition (NER) relationships across a mix of voice commands, text messages, images, and GUI interactions. The data engine let the team produce signal with any contributor, internal or external, and collaborate with in-house domain experts, with reviewers enforcing quality benchmarks. Retail conversational AI is complex — a Text-to-Shop customer may want milk, then organic milk, then Great Value organic milk, each a label — and Labelbox lets Walmart verify the model identifies the right product. Annotate and the Python SDK automated orchestration from Google BigQuery, Walmart's core data infrastructure, and model-assisted labeling sped production by letting the team adjust pre-labels instead of building ground truth from scratch.

Conversational AI is a very exciting and challenging field, because large language models are extremely sensitive to data quality,” shared Philippe Hanrigou, a Walmart director of data-science dedicated to conversational AI and chatbots. “Labelbox is a gamechanger because of the quality of the labels we can now acquire for our models. It’s hard to imagine a life now without Labelbox.

The outcome

Walmart got full visibility into its signal pipeline through in-depth analytics, and turned labeling-performance insights into gains in throughput, efficiency, and quality. Labeled-data accuracy improved by an estimated 25% through Labelbox's quality assurance, review workflows, and real-time collaboration. Labelbox's labeling services delivered high-quality signal at 95% accuracy and a 25% reduction in turnaround time versus comparable services. Labelbox now helps power Walmart's Text to Shop and customer service platforms, and is working with other Walmart teams on LLM and GenAI initiatives.

Where this goes

LLMs are extremely sensitive to data quality. The signal that grounds a retail assistant in real products and real intent is what makes it trustworthy at Walmart's scale.