Connecting Cloud Data Overview
Generate URLs for each Data Asset
You will first need to generate or retrieve the URL links for each data asset from your cloud storage location. If your data is not private, you can simply make the data in your cloud folder public and generate/retrieve the URLs.
Private Data Using Signed URLs
For private data, you must generate signed URLs for each data asset. A signed URL is a feature that allows you to protect your data from unauthorized access with a key. It adds a unique key to the URL such that the only way to access the data via the URL is with this URL + key. A signed URL looks like
http://example.com/filename?hash=DMF1ucDxtqgxwYQ==. In regards to Labelbox, the only data that is passed to us is this signed URL and not the data files themselves.
For added security, you may also specify a range of IP Addresses of the users who can access your content.
Here’s some resources on creating signed URLs for popular hosting providers:
Create a File with URLs
If your data is hosted in the cloud (e.g. Amazon S3), you can point Labelbox to your data by creating a JSON file or a CSV file with URLs to each file.
Creating a JSON file (Recommended)
Create a JSON file containing the data URLs. For example, here is a JSON snippet for importing data hosted in Google Storage: Download an example JSON file here.
Creating a CSV File
The first column is the URL and the second column is the External ID (Optional). Example CSV file containing image URLs.
Upload The JSON or CSV File
After selecting the file to upload, you must choose the column to be labeled. You may also choose an external ID field (optional) which will be included in the data export once labeling is complete.