Labelbox•November 18, 2021
Announcing Labelbox on Databricks Partner Connect
Together, Labelbox and Databricks provide customers a powerful solution for unstructured data workflows. Customers can annotate their images, video, text, audio, and geospatial data in Labelbox and perform data science in Databricks. We previously released the Labelbox Connector for Databricks to make it easier for teams to train AI on unstructured data in the Databricks Lakehouse.
“Labelbox is committed to helping customers in their journey to both harness AI for their competitive advantage and to implement AI at scale. We are excited to work with Databricks’ new Partner Connect portal to further enhance the value we create for customers.”
—Manu Sharma, CEO of Labelbox
Today, we are thrilled to announce that Labelbox is a launch partner in the Databricks Partner Connect portal. Databricks Partner Connect offers Databricks users an easy way to discover and connect the best tools to tackle problems in Data Engineering, Data Science, and Analytics. With a few clicks, users can integrate with tools like Labelbox, Fivetran, Tableau, and more.
The Labelbox integration offers a significant improvement for new users of the Labelbox Connector for Databricks. Instead of searching for documentation and writing your own initialization code, users now have access to a guided Labelbox connector experience through Databricks Partner Connect.
Clicking the Labelbox tile in Partner Connect will lead to a trial creation page. We’ll then deposit a tutorial notebook into your Databricks shared directory.
The Labelbox tutorial notebook guides you through a typical workflow with Labelbox: Start with unstructured data in your Data Lake, pass it to Labelbox for annotation, and load your annotations into Databricks for AI and Analytics.
Advanced Labelbox Capabilities for the Lakehouse
In AI development, it is often challenging to find the right data and visually inspect model results. Teams spend a lot of time and money procuring unstructured data (e.g. images, video, text, and audio), inspecting the data, and comparing model predictions to ground-truth. This kind of workflow is particularly challenging in a notebook environment where you must load predictions in notebook cells or output results to the filesystem for manual inspection.
Catalog and Diagnostics
Databricks users can leverage Labelbox’s visual Catalog to browse unstructured data in the data lake. Model Diagnostics can also be used to visualize model errors and identify opportunities to improve model training.
We recommend using Model Diagnostics in conjunction with Managed MLFlow on Databricks to set up a best-in-class active learning workflow in your Lakehouse. You can combine these tools with Delta Lake Time Travel to create a powerful, reproducible active learning workflow.
Questions or comments? Contact us at ecosystem+databricks@labelbox.com.