Together, Labelbox and Databricks provide customers a powerful solution for unstructured data workflows. Customers can annotate their images, video, text, audio, and geospatial data in Labelbox and perform data science in Databricks. We previously released the Labelbox Connector for Databricks to make it easier for teams to train AI on unstructured data in the Databricks Lakehouse.

“Labelbox is committed to helping customers in their journey to both harness AI for their competitive advantage and to implement AI at scale. We are excited to work with Databricks’ new Partner Connect portal to further enhance the value we create for customers.”

—Manu Sharma, CEO of Labelbox

Today, we are thrilled to announce that Labelbox is a launch partner in the Databricks Partner Connect portal. Databricks Partner Connect offers Databricks users an easy way to discover and connect the best tools to tackle problems in Data Engineering, Data Science, and Analytics. With a few clicks, users can integrate with tools like Labelbox, Fivetran, Tableau, and more.

The Labelbox integration offers a significant improvement for new users of the Labelbox Connector for Databricks. Instead of searching for documentation and writing your own initialization code, users now have access to a guided Labelbox connector experience through Databricks Partner Connect.

Clicking the Labelbox tile in Partner Connect will lead to a trial creation page. We’ll then deposit a tutorial notebook into your Databricks shared directory.

The Labelbox tutorial notebook guides you through a typical workflow with Labelbox: Start with unstructured data in your Data Lake, pass it to Labelbox for annotation, and load your annotations into Databricks for AI and Analytics.

The Labelbox Connector for Databricks allows data teams to easily annotate unstructured data for AI/ML projects. With Delegated Access support for AWS, Azure, and GCP storage, Labelbox makes it easier than ever to annotate data in your data lake.
Easily move between the Databricks and Labelbox to support AI workflows. Use Databricks to power Model Assisted Labeling in Labelbox to power human-in-the-loop review.

Advanced Labelbox Capabilities for the Lakehouse

In AI development, it is often challenging to find the right data and visually inspect model results. Teams spend a lot of time and money procuring unstructured data (e.g. images, video, text, and audio), inspecting the data, and comparing model predictions to ground-truth. This kind of workflow is particularly challenging in a notebook environment where you must load predictions in notebook cells or output results to the filesystem for manual inspection.

Catalog and Diagnostics

Databricks users can leverage Labelbox’s visual Catalog to browse unstructured data in the data lake. Model Diagnostics can also be used to visualize model errors and identify opportunities to improve model training.

Labelbox Catalog enables users to view all their assets and curate a dataset that will significantly improve model performance.
Labelbox Model Diagnostics helps users identify and eradicate model errors.

We recommend using Model Diagnostics in conjunction with Managed MLFlow on Databricks to set up a best-in-class active learning workflow in your Lakehouse. You can combine these tools with Delta Lake Time Travel to create a powerful, reproducible active learning workflow.

Questions or comments? Contact us at ecosystem+databricks@labelbox.com.