Labelbox•November 2, 2021
On October 20-21, we gathered with leaders and practitioners in AI/ML for our annual flagship conference. We heard from organizations such as ARK Invest, the United States Navy, NASA/JPL, Genentech, Stryker, and many more. Keep reading for a list of important takeaways from the event.
Matthew McAuley, a Sr. Data Scientist at Allstate, discussed how his team used to send large datasets to their labeling team after training them on the task. Since they’ve started using a dependable platform to manage their labeling processes, they’ve learned that the best way to ensure quality labeled data is to curate small batches of assets, have them labeled, and iterate on them before using the dataset for model training.
This process ensures that the labeling team thoroughly understands the task at hand, including edge cases. It also prevents one or two miscommunications from ruining the quality of an entire labeled dataset. With constant feedback and communication, the Allstate team can reliably get small, high quality training datasets that significantly improve model performance.
Blue River Technology, the AI division of John Deere, found that labeling data for their highly specialized agriculture use cases was taking up huge chunks of their time and budget. In 2019, the average label took more than an hour, and they had hundreds of thousands of images to label. They were also experiencing issues with label quality. Plants were sometimes misclassified or missed entirely by the labeler.
To improve both the time and quality, the team conducted seventy-five experiments to find the optimal labeling pipeline. They tried different methods to improve operational efficiencies, used different models, and experimented with model-assisted labeling. Two years of trial later, the team has a reliable pipeline that uses pre-labeled data from a different model. They are now down to twenty-five minutes per label.
Labelbox has worked with hundreds of AI and ML teams since our founding in 2018, and we’ve learned that the most advanced, successful organizations have a few important things in common.
Ramanan Paramasivan, R&D Director at Stryker, discussed his team's process of moving from a hodge-podge of open source and in-house labeling tools to a training data platform. Stryker first had a desktop solution built on top of Microsoft Azure. It was so clunky and prone to errors that the team outgrew it after only three months. Their domain experts, typically surgeons helping the team label and review data, never used the solution at all.
Lars Roessler of BSH Startup Kitchen noted that their ML team collected data on their hard drives, which became impossible to keep up with once their operations scaled up and more products were used to collect data. While in-house solutions can sometimes work for small ad-hoc ML projects, organizations with goals to scale ML in any capacity would be better off investing in a training data platform.
Trying to build enthusiasm among leadership for new and occasionally risky projects can be a challenge. One common theme among the panelists was to get buy-in first from the primary users of the model. Lars Roessler of BSH Startup Kitchen recommends creating a prototype that customers or the primary users can try out, and use positive feedback to boost your argument to leadership. This worked particularly well for one of his use cases, a smart fridge whose AI surprised and delighted potential customers at a live demonstration. For the ML team at Stryker, who creates algorithms that help inform surgeons of relevant and potentially life-saving information during surgery, getting feedback and requirements from top surgeons was an essential part of the process. Presenting a use case to senior leadership with the support of these domain experts goes a long way to getting the green flag. "Show them how it works in real time," said Ramanan Paramasivan of Stryker. "That resonates a lot more with senior leadership more than any kind of show and tell."
Criteo, an online advertising enterprise powered by AI, struggled without a reliable data labeling pipeline and collaboration method. They often relied on emails and spreadsheets to communicate and track their labeling, which quickly became untenable. Once they invested in Labelbox, the company realized “humongous benefits,” according to Hong Noh, Sr. Product Manager at Criteo. These benefits include a 40% increase in annotation speed, improved label quality, and a massive reduction in back-and-forth emails.
A panel of defense experts discussed the responsibilities and challenges the US faces in the pursuit of building preeminent AI capabilities at Labelbox Accelerate. By implementing a big data strategy and creating necessary architectures, AI is enabling vast opportunities and innovations for our armed forces that we’ve previously never been able to fully leverage.
“The speed of warfare now is unlike anything we’ve had to deal with in the past. We need AI technology to ensure our leaders at every level can make good, accurate judgments on where to apply force or where not to.”
— Admiral William Moran, United States Navy
Admiral Moran was joined by George Hoyem of IQT, Doug Philippone of Palantir Technologies, and Bryan Walsh of Axios in this session. The panel also shed light on the many use cases within the military for machine learning technology. Although the use of AI in sophisticated weapons systems is usually top of mind, machine learning also plays a large role in identifying cybersecurity threats, retaining talent, streamlining communications, increasing operational efficiencies, and reducing administrative costs through automation.
What sets the best AI teams in production apart from the ones that are encountering difficulties getting to production is simple: the best AI teams have figured out an organized, efficient and repeatable workflow.
We are now bringing two new products to Labelbox that enable you to do just that without ever having to build your own tools or leave the Labelbox ecosystem. With the introduction of Model Diagnostics and Catalog, ML teams can complete a full workflow.
“Labelbox enables you to rapidly iterate with your AI models. You never have to leave Labelbox to do the most fundamental steps in model development, which are label, diagnose problems with your models, then smartly select data based on those insights you’ve uncovered, so that you can label more data that will fix and lift the performance of your model as quickly as possible.”
— Manu Sharma, Cofounder and CEO, Labelbox
These breakthroughs will allow ML teams to make faster iterations while keeping human labeling and costs at a minimum. Every step of the way, Labelbox now enables you to make insightful decisions based on data and analytics and improve your MLOps.
All session recordings from Labelbox Accelerate 2021 are now available on demand. Watch them here.