logo
×

How to sync your cloud buckets with Labelbox

Labelbox recently introduced the ability to synchronize your cloud buckets from Amazon, Google and Microsoft Azure into Labelbox Catalog. This improvement greatly simplifies the existing integration by eliminating the need to customize JSON or configure Python scripts.

Labelbox supports native integrations with cloud storage from leading providers including:

  • Amazon S3
  • Google Cloud Storage
  • Microsoft Azure Blob Storage

If you are familiar with how cloud storage works, you can integrate your cloud bucket when adding a dataset to Labelbox. This guide is intended to show you how to set up and use cloud storage integration with Labelbox Catalog.

Why use cloud storage integrations?

Cloud buckets are a simple and efficient way to manage large volumes of unstructured data, especially for computer vision use cases involving working documents, images and video. The Labelbox cloud storage integration can automatically scan any set of folders in your cloud bucket and synchronize the following data types into the Labelbox Catalog:

  • Image
  • Video
  • Text
  • Audio
  • HTML
  • Tiled imagery (COG, NITF, GeoTIFF)
  • Documents
  • Chat (Conversations)

How to sync a dataset with your cloud bucket?

After completing the one-time setup of delegated access to your object store (see section below), you can use the Labelbox UI to configure the synchronization of any folder to a dataset in the Labelbox Catalog with just a few clicks.

Once configured, you can sync any connected dataset with your cloud storage with a single click.

How to set up delegated access to your cloud storage provider

The one prerequisite for using the Labelbox Cloud Storage integration is setting up IAM delegated access. This is a one-time setup process for each object store allowing Labelbox to access assets stored in your cloud buckets that you would like to add to Catalog or label using Annotate. 

IAM delegated access works similarly whether you are using Amazon S3, Microsoft Azure or Google Cloud Storage (GCS). For example, when you use IAM delegated access to add your unlabeled data to Labelbox, you can keep your assets in Amazon S3 and grant Labelbox read-only access to your AWS cloud buckets.

Delegated Access setup in AWS (similar in GCP)

IAM delegated access is highly flexible and allows you to control access at the granularity that you desire.

  • You can grant Labelbox access to all of your buckets, a single bucket, or even a particular path within a bucket.
  • You can even set up different integrations within Labelbox for different datasets or projects.

IAM delegated access allows you to use private cloud-hosted buckets with Labelbox, which helps to ensure that your assets are kept safe.

Refer to the documentation for additional information on setting up IAM delegated access in Labelbox, and add data to Labelbox from your cloud bucket with ease.

Summary

Storing data in cloud buckets is a simple and effective way to manage large volumes of unstructured data – especially images, video and documents. Connecting and synchronizing your cloud storage with datasets in Labelbox has never been easier.

If you’re already using cloud storage, add data to Labelbox today.