How to build defect detection models to improve predictive maintenance

With AI-powered defect detection, you can now seamlessly integrate the latest advances in foundation models into your equipment maintenance and QA operations. As the demand for better monitoring continues to rise, it's essential for teams to maximize the lifespan of their critical assets and minimize operational and quality-related costs. Labelbox empowers the world’s largest organizations to leverage AI solutions tailored to their unique object detection challenges.

However, teams can face multiple challenges when implementing AI for large-scale defect detection. This includes: 

  • Data quality and quantity: Improving defect detection requires a vast amount of data in the form of images and videos. Orchestrating data from various sources can not only be challenging to maintain, but even more difficult to sort, analyze, and enrich with quality insights.
  • Dynamic review landscape: The changing nature and format data from multiple sources poses the challenge for businesses to account for continuous data updates and re-training needs. 
  • Cost & scalability: Developing accurate custom AI can be expensive in data, tools, and expertise. Leveraging foundation models, with human-in-the-loop verification and active learning, can help accelerate model development by automating the labeling process.

Labelbox is a data-centric AI platform that empowers businesses to transform their predictive maintenance through advanced computer vision techniques. Instead of relying on time-consuming manual human review, companies can leverage Labelbox’s AI-assisted data enrichment and flexible training frameworks to quickly build task-specific models that uncover actionable insights for defects faster.

In this guide, we’ll walk through an end-to-end workflow on how your team can leverage Labelbox’s platform to build a powerful task-specific model to improve defect detection on pipes. Specifically, this guide will walk through how you can explore and better understand your assets to make more data-driven business decisions for predictive maintenance.

See it in action: How to build defect detection models to improve predictive maintenance

The walkthrough below covers Labelbox’s platform across CatalogAnnotate, and Model. We recommend that you create a free Labelbox account to best follow along with this tutorial.

Part 1: Explore and enhance your data with Foundry

Part 2: Create a model run and evaluate model performance

You can follow along with both parts of the tutorial below via:

Part 1: Explore and prepare your data

Follow along with the tutorial and walkthrough in the Colab Notebook. If you are following along, please make a copy of the notebook. 

Ingest data into Labelbox 

For this tutorial, we’ll be working with a dataset that shows multiple parts of equipment for pipes for a defect detection use case  – with the goal of quickly curating data and finding 3 specific parts (i.e., pipe, flange, elbow) from high-volumes of images to detect corrosion and broken parts.

The first step will be to gather data:

Please download the dataset and store it in an appropriate location on your environment. You'll also need to update the read/write file paths throughout the notebook to reflect relevant locations on your environment. You'll also need to update all references to API keys, and Labelbox ontology, project, and model run IDs

  • If you wish to follow along and work with your own data, you can import your data as a CSV.
  • If your images sit as individual files in cloud storage, you can reference the URL of these files through our IAM delegated access integration

Once you’ve uploaded your dataset, you should see your image data rendered in Labelbox Catalog. You can browse through the dataset and visualize your data in a no-code interface to quickly pinpoint and curate data for model training. 

Search and curate data

You’ll now be able to see your dataset in Labelbox Catalog. With Catalog, you can contextualize your data with custom metadata and attachments to each asset for greater context. 

In this demo, we'll be using Catalog to find relevant images of pipes for our dataset with the goal of annotating bounding boxes for the pipe parts using foundation models.

Leverage custom and out-of-the-box smart filters and embeddings to quickly explore product listings, surface similar data, and optimize data curation for ML. You can: 

Using Foundry to pre-label bounding boxes

In this next step, we'll walk through how you can take a human-in-the-loop approach to iterate or modify pre-labels and speed up the annotation process.

Model Foundry enables teams to choose from a library of models and in this case, we'll be comparing the effectiveness of two object detection models (Grounding DINO vs. OWL-VT) to generate previews and attach them as pre-labels.

With Model Foundry, you can automate data workflows, including data labeling with world-class foundation models. Leverage a variety of open source or third-party models to accelerate pre-labeling and cut labeling costs by up to 90%.

Part 2: Train a YOLOv8 model and generate predictions

The next step is to take all of this labeled data and train our model on it, and then make predictions on this data which allows Labelbox to calculate evaluation metrics, and see where the model is going wrong.

As an additional visual method, you can navigate to the Labelbox projector view to visualize different groups or clusters of different classes. You'll see that there are three different clusters, which aligns with our expectations because we have three different classes across pipes, elbows, and flanges. This allows you to find outliers in the clusters to provide an initial review.

Run inference from trained model on unlabeled data from your Databricks notebook

One alternative method to running the inference workflow in the previously shown step is to take model weights and deploy them directly within Labelbox Foundry as a custom model. The benefits of this is that it will allow you to run predictions using your custom model as an end-to-end workflow and more quickly classify parts of interest (i.e., elbows, pipes, and flanges).

Note: If this is interesting and you're looking to adopt this method within Labelbox, please reach out to our support team as we would be happy to assist with deploying your custom model within Foundry.

View model predictions within the Labelbox UI to evaluate and diagnose model effectiveness

As a last step, let's compare model inferences with ground-truth annotations to see where the model may be underperforming. A disagreement between model predictions and ground truth labels can be due to a model error (poor model prediction) or a labeling mistake (ground truth is wrong). 

  • After running the notebook, you’ll be able to visually compare ground truth labels (in green) to the model predictions (in red).
  • Use the ‘Metrics view’ to drill into crucial model metrics, such as confusion matrix, precision, recall, F1 score, and more, to surface model errors.
  • Model metrics are auto-populated and interactive. You can click on any chart or metric to open up the gallery view of the model run and see corresponding examples.
  • Use Labelbox Model for 10x faster corner-case detection – detect and visualize corner-cases where the model is underperforming.

After running error analysis, you can make more informed decisions on how to iterate and improve your model’s performance with corrective action or targeted data selection and additional labeling.


By analyzing high-volumes of images and videos using foundation models and human alignment, Labelbox provides teams with the ability to inject valuable human-in-the-loop insights for delivering better models that allow you to detect defects that can help improve operational efficiency, quality assurance and overall equipment lifespan.

Labelbox is a data-centric AI platform that empowers teams to iteratively build powerful task-specific models. To get started, sign up for a free Labelbox account or request a demo.