How Dialpad advances AI development in NLP and LLMs with Labelbox
Problem
Dialpad’s previous annotation provider struggled to meet the data quality requirements needed for the team’s AI projects. While the team originally tried to improve labeling quality by spending more time reviewing and redesigning labeling projects with their provider, eventually the lack of quality resulted in slowed progress on their AI projects.
Solution
Dialpad turned to the Labelbox platform and Boost services for higher quality training data and reduced time requirements from their own data scientists for labeling project design and review. Labelbox enabled the team to collaborate more smoothly with labelers and produce higher-quality labeled data for their AI use cases.
Result
After a year of leveraging Labelbox for training data, Dialpad saw a 20% improvement in labeling quality and a 41% reduction in labeling costs. Their AI team has also seen a marked improvement in team productivity and motivation, with data scientists proactively requesting the training data they need without tying up their own time and resources on data labeling.
Dialpad provides AI-driven customer engagement, communications, and intelligence solutions for businesses. Their NLP and LLM-powered products are being used to help improve transcription, summarization, sentiment analysis, and other core product capabilities. Building and maintaining these models require large amounts of training data for custom models as well as subject matter expertise and fine-tuning when it comes to LLMs. The team was previously reliant on legacy labeling services that crowdsourced labeling and produced large amounts of training data quickly, but found that it ultimately lacked the required quality levels needed to meet their long-term needs.
As the team scaled their AI efforts and focused on fine-tuning their many production models as well as building new, more specialized models, they found that their training data did not meet the quality standards required for their projects. Labels were often inaccurate or missing. At first, the team dedicated more time to their labeling provider to redesign labeling projects, leveraging special teams and workflows in an attempt to upgrade the quality of their training data. Even with the extra time and resources spent on labeling, however, the training data failed to meet the quality standards required for the team’s AI projects.
As time went on, data scientists at Dialpad became hesitant to request more training data even when it was necessary for AI development, solely because of the time and effort involved, and the strain the labeling process put on their resources.
“Our data scientists were less proactive about requesting labeled data because they knew how much effort it was going to be and how much pain it was going to cause,” said Anne Paling, Manager of Data Annotation and Testing at Dialpad.
After several years of working with their previous labeling provider and receiving labeled data that continued to steadily decline in quality, the team began looking for a new labeling provider with two primary requirements, (1) high-quality training data and (2) less time spent on labeling by the Dialpad team.
The team settled on Labelbox as their new labeling solution for handling a variety of NLP and LLM-focused tasks. Labelbox offered a software-first approach and provided the team with an efficient, transparent way to get their datasets labeled quickly, accurately, and comprehensively. Dialpad also leveraged Labelbox Boost to find and employ the best labeling team for their use cases, reducing the need for extra supervision from Dialpad’s data scientists.
A year into using Labelbox for their training data, Dialpad saw significant improvements in quality, time and effort saved, and increased productivity and motivation of their team members. Now, their data scientists are proactively asking for the training data they need, and are able to scale their AI development faster and more easily. By leveraging Labelbox Boost for labeling services instead of crowdsourced labelers, Dialpad was also able to ensure the privacy and security of their data throughout the labeling process. More surprisingly for the team, switching from their first labeling service also helped them save on labeling costs, even though they were originally willing to pay more for higher-quality training data.
"Before, we were averaging roughly 29 cents per data point. This year with Labelbox, it’s down 48% to only 15 cents per data point. We essentially labeled more data for less money with an average increase in accuracy of over 20%," said Paling.