LabelboxMarch 26, 2021

Labelbox CEO discusses breakthroughs in AI training data

Popular AI newsletter TheSequence released their interview of Labelbox CEO and Cofounder Manu Sharma earlier this month. Here are our top three takeaways from the interview.

The process of building AI has changed significantly

"A few years ago, there simply weren't tools and workflows to iterate with training data," said Manu. But today, training data platforms are enabling teams to mature faster and more sophisticated. They're able to iterate within the training data creation process as well as iterate faster on the model itself, and are now focusing on understanding a model's weak areas to refine and accelerate the labeling process.

AI teams can automate labeling across any data type

Labelbox customers typically use model-assisted labeling and active learning methods to accelerate their training data creation. Active learning involves selecting a smaller amount of training data with the highest chance of improving model performance. With model-assisted labeling, the work-in-progress model is used to generate pre-labeled data, which is then corrected by domain experts and used as training data. "The cost of correcting the pre-labels can be up to 80% lower than the cost of creating the label from scratch," said Manu.

Synthetic data won't fully replace human supervision in AI

Synthetic data creation has become more popular and more similar to real datasets over recent years. GAN models can generate augmented data to diversify datasets where real data is thin, or to mitigate issues caused by changes in the camera sensor or lighting conditions. But at Labelbox, we're seeing our customers build complex AI systems where synthetic data generation is completely unnecessary, and at best used only for augmentation. "I think we humans will continue to supervise AI systems for a while. Don't underestimate human ingenuity," said Manu.

Read the whole interview at TheSequence, and be sure to subscribe to the newsletter while you're there.