Labelbox•May 25, 2021

Announcing the Labelbox connector on Databricks: Productionizing unstructured data for AI and analytics at scale

Large data lakes typically house a combination of structured and unstructured data. Data teams often use Apache Spark™ to analyze structured data, but may struggle to apply the same analysis to unstructured, unlabeled data (specifically in the form of images, video, etc). To tackle these challenges, Fortune 500 enterprises such as WarnerMedia and Stryker are leveraging Labelbox’s training data platform to quickly produce structured data from unstructured data. Labelbox has been used to support a variety of production AI use-cases, including improved marketing personalization through visual search, manufacturing defect detection, smart camera development, and more.

Labelbox’s training data platform supports a variety of production AI use cases.

In the past, AI/ML teams had to use expensive and manual processes to transform their unstructured data into something more useful — either by paying a third party to label their data, buying a labeled dataset, or narrowing the scope of their project to leverage public datasets. Finding faster and more cost effective ways to convert unstructured data into structured data is highly beneficial towards supporting more advanced use-cases built around their companies’ unique, unstructured datasets.

With Labelbox, Databricks users can quickly convert unstructured to structured data and apply the results to a range of machine learning use cases, from deep learning to computer vision.

With Databricks, data science and AI teams can now easily prepare unstructured data for AI and analytics. Teams can label data with human effort, machine learning models in Databricks, or a combination of both. Teams can also employ a model-assisted labeling workflow that allows humans to easily inspect and correct a model’s predicted labels. In terms of time and cost savings, this process can drastically reduce the amount of unstructured data you need to achieve strong model performance.

With LabelSpark, the Labelbox connector on Databricks, data teams can use a model-assisted labeling workflow that allows humans to easily inspect and correct a model’s predicted labels.

Labelbox has recently launched a connector between Databricks and Labelbox — the LabelSpark library — so teams can connect an unstructured dataset to Labelbox. With LabelSpark, teams can programmatically set up an ontology for labeling and return the labeled dataset in a Spark DataFrame. Combining Databricks and Labelbox gives data and AI teams an end-to-end environment for unstructured data workflows, along with a query engine built around Delta Lake, coupling fast annotation tools with a powerful machine learning compute environment.

Learn more about using Databricks with Labelbox and see a live technical demo of the workflow at the Productionizing Unstructured Data for AI and analytics session at Data + AI Summit 2021. Visit our partnership page as well for more details on the integration.

Continue reading

Labelbox•August 5, 2025

Introducing Labelbox Evaluation Studio: Drive AGI advancements with real-time feedback on model performance

Labelbox Evaluation Studio unlocks a private, real-time platform where top AI teams unlock tailored insights, instantly spot strengths and weaknesses, and accelerate faster frontier model improvements.

Labelbox•May 16, 2025

Rubric evaluations: Fueling the next wave of reinforcement learning

See how Labelbox utilizes custom rubric-based evaluations to help leading AI labs train and assess advanced frontier models with depth and nuance.

Labelbox•May 15, 2025

Prompt to production: How to improve AI app generators with rubric evals

Discover how modern rubric-based evaluations and human evaluation are crucial for advancing the capabilities of prompt-to-app and AI app generators.

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free