5 Key takeaways from Labelbox Academy: Learning the essentials

On May 6th, we held our first Labelbox Academy event, which featured customer speakers from Move.ai, Sharper Shape, and Cape Analytics, sessions on how to use several Labelbox workflows and features, and a keynote speech from OpenAI VP of Product, Peter Welinder. Read on to see what we learned at the event.

Data is all that matters to get a high quality ML model

The Labelbox Academy event kicked off with a keynote speech from Peter Welinder, VP of Product & Partnerships at OpenAI. He presented his work on the GPT-3, a natural language processing model that can predict the next word in a sentence, translate between languages and dialects, summarize text, turn text into an image, and more. One of the key points he made at the end was this: "Really all you need as a developer, as a machine learning practitioner, is the data. That's really what matters to improve performance."

Efficient, accurate labelers are vital for creating training data quickly

Using a robust, intuitive Training Data Platform is important, but the responsiveness and efficacy of the labeling team working on the project is integral, especially when it needs to be completed on a shortened timeline. When the Move.ai ML team needed to retrain their model within six weeks to meet a deadline for a new client, they turned to the Labelbox Workforce for help.

We got the data from the client, we processed it through what we needed to do in terms of getting the labeling data ready, we fed it to the labeling team, and it was delivered often ahead of schedule. We were really helped by Labelbox not only by the simplicity of the platform to get the data in to be labeled but also just the reactivity of the team in order for us to get from start to finish and deploy successfully.
— Niall Hendry, Head of Product, Move.ai

Detailed instructions makes the labeling process faster and easier

Especially when working with an external labeling team who hasn't worked on your ML projects before, it's important to be thorough when giving labeling instructions. "You know your product and you know what you need to do," said Hendry. "Make sure you give them as much detail as possible and that will get you off the ground...as quickly as possible."

Monitor the labeling progress as much as you can

Tracking the metrics on your labeling projects will help your ML team find weaknesses in the process as well as better estimate how much time future projects will take. Taking advantage of all the monitoring and quality management tools provided by your TDP — including consensus scores for labeling tasks with different levels of complexity. "Constantly monitor the performance of the team as well as of the individual workers, and consider spot checking labels of the workers with low consensus scores to reduce the variance in the data," said Sikha Das, Senior Data Analyst at Cape Analytics. This will increase the quality of your ground truth data and enable more efficient model training.

Low-quality data bites twice

"It messes up your model training and it also gives you the wrong information when you're making decisions with your model results," said Edward Kim, Data Analyst at Sharper Shape. Before they turned to Labelbox to create their training data, they experienced several setbacks. Their in-house labeling team used a low quality labeling tool that required each labeler to set configurations on their own, which increased the amount of variation in how the data was labeled. With Labelbox, Sharper Shape can now rely on a uniform ontology.

Learn more about Move.ai, Sharper Shape, and Cape Analytics by reading our case studies on each enterprise. You can also watch recordings of each session from Labelbox Academy: Learning the essentials.