LabelboxApril 8, 2024

Labelbox introduces no-code data pipeline integrations

Integrating data pipelines with Labelbox

Labelbox Catalog is a powerful tool for AI teams to visualize and explore unstructured data from a variety of data sources in order to prepare and generate high quality datasets. Labelbox is making it easier than ever to connect data pipelines using our new data warehouse integration tool powered by Census. This integration provides a no-code setup and configuration to synchronize with over 25 data storage options reducing the time and cost of your data management. Data engineers now have full change data capture support for a wide array of data sources from major data warehouses to simple spreadsheets and pretty much everything in between.

For AI teams, synchronizing data between multiple data stores typically requires data engineers to create and maintain a set of Python scripts that are often brittle and time consuming to manage. With our new data warehouse integration tool, data engineers can now build data pipelines in less than five minutes with no code required. Labelbox will automatically create and set up a Census workspace for Labelbox data admins to configure Syncs with their preferred data warehouse. The integration offers flexible synchronization options with full support for upserts to mirror new data creation, updates and deletes in your Labelbox Catalog datasets.

Reduce your data pipeline costs

For AI teams using major data warehouses – like Google Big Query, Databricks, Snowflake or Amazon Redshift – the cost of building and maintaining data pipelines can be significant in terms of time and money. Using the new data warehouse integration tools in Labelbox not only lowers compute costs, it greatly reduces the time spent building and managing custom Python scripts. Data engineers can easily map global keys and key fields from your data warehouse to one or more datasets in Labelbox Catalog directly from the UI. For some of our largest customers managing sophisticated data pipelines, this could result in tens of thousands of dollars of cost savings.

Sync Google Sheets in seconds

For AI teams doing rapid prototyping or proof of concepts (POCs) in Labelbox, data is often pulled in from simple spreadsheets. Synchronizing data with Google Sheets has never been easier; you simply need to share the file with Labelbox and mapping 2-3 fields. The integration supports text, URL or URI as well as metadata and attachments, similar to our current Python SDK.

Connect with over 25 cloud storage providers

In addition to our existing integrations to sync data from cloud buckets like Google Cloud Storage, Amazon S3 and Microsoft Azure Blob Storage, our new data warehouse integration tool leverages Census to offer integrations with virtually any cloud database including Postgres, MySQL, SQL Server, Firebase, and more.


The new data warehouse integration tool from Labelbox allows customers to integrate data pipelines in less than 5 minutes. With over 25 data integrations available, AI teams can easily keep build and maintain datasets in Labelbox Catalog to power their data engine.

The new data warehouse integration powered by Census is available to all customers for free for up to two concurrent Syncs. For customers needing additional Syncs or wishing to connect with their own Census workspace, please contact us to learn more about joining the Enterprise Beta program.