How to import data

Labelbox has two ways of importing data:

  1. Row by row: Good for real-time syncing and fine grain control.
  2. Bulk import: Good for importing large amounts of data.

If you’re uploading large quantities of data, use the bulk import function to prevent being rate-limited, or running into errors.

Row by row

A Data Row represents a single piece of data that needs to be labeled. For example if you have a CSV with 100 rows you will have 100 Data Rows.

Usage

Row-by-row importing is ideal for manipulating existing databases, whether you are adding specific data you missed or connecting a live feed of information to the database.

Query

To add a Data Row to an existing database, use the mutation createDataRow. This mutation adds a row to that database and adds the data to the labeling queue for any projects attached to that database.

Fields:

externalId is optional, and is usually the filename related to the image.

rowData is the url link to the data you want to add. For more information see Connecting Cloud Data.

dataset id is the unique Labelbox id identifying the dataset, this can be retrieved by querying the dataset in the API explorer.

The Labelbox API is rate limited at 300 requests per minute. We recommend sending Data Row import requests one after another and not in batch.

Run this query

mutation {
createDataRow(
data: {
externalId: "<FILENAME_OR_OTHER_ID>",
rowData: "<DATA_THAT_NEEDS_TO_BE_LABELED>",
dataset: {
connect: {
id: "<DATASET_ID_HERE>"
}
},
}
) {
id
}
}
def create_datarow(dataset_id, image_url, external_id):
res_str = client.execute("""
mutation createDataRowFromAPI(
$image_url: String!, $external_id: String!, $dataset_id: ID!) {
createDataRow(
data: {
rowData: $image_url
externalId: $external_id
dataset: { connect: { id: $dataset_id } }
}
){
id
}
}
""", {
'dataset_id': dataset_id,
'image_url': image_url,
'external_id': external_id
})

res = json.loads(res_str)
return res['data']['createDataRow']['id']

Query Dataset ID

Use this query to collect the Dataset ID based off the name of the dataset you want to add to

query {
datasets(where: {name: "<DATASET_NAME>"}){
id
name
}
}

Create empty Dataset

You can also create a new dataset, and then use the ID output to add data to that dataset. The dataset can be added to a project as usual in the project creation screen.

mutation {
createDataset(
data:{
name: "<INSERT_NAME_HERE>"
}
) {
id
}
}

Bulk import

Usage

Bulk importing is optimized for uploading large quantities of data at once, by reading the information out of a JSON containing all the new data. The bulk import function takes a url directly linking to the JSON file, so it can interface with cloud data servers or APIs natively.

If you have the file locally on your computer, our walkthrough shows how to make a one-off cloud URL to easily integrate this into your workflow.

Query

To Add datarows to an existing database, you’ll have to use the mutation appendRowsToDataset. This mutation adds all of the included rows to that dataset and adds the data to the labeling queue for any projects attached to that dataset.

Fields (details below): datasetId is the unique Labelbox id identifying the dataset, this can be retrieved by querying the dataset in the API explorer or as the output when you create a new dataset jsonFileUrl is a URL to a JSON file stored in the cloud.

mutation AppendRowsToDataset {
appendRowsToDataset(
data:{
datasetId:"<DATASET-ID-HERE>",
jsonFileUrl:"<JSON-URL-HERE>"
}
){
accepted
}
}
def importBulkData(dataSetId, jsonFileURL):
""" returns true if upload was successful.
See the documentation for more informaton:
https://labelbox.com/docs/api/data-import
"""
res_str = client.execute("""
mutation AppendRowsToDataset($dataSetId : ID!, $jsonURL: String!){
appendRowsToDataset(
data:{
datasetId: $dataSetId,
jsonFileUrl: $jsonURL
}
){
accepted
}
} """, {'dataSetId': dataSetId, 'jsonURL': jsonFileURL})

res = json.loads(res_str)
return res['data']['appendRowsToDataset']['accepted']

JSON formatting and one-off uploads

You’ll need to create and upload a JSON file with the same format as a JSON import through the UI (docs here). If you already have this JSON in a cloud server, simply provide the link.

If you have the JSON locally, you can upload it to file.io directly or in your python code:

import requests

def upload_to_file(dict):
data = str.encode(json.dumps(dict))
files = {'file': data}
r = requests.post("https://file.io/?expires=1d", files=files)
file_info = json.loads(r.text)
return file_info['link']

Querying Dataset IDs

Use this query to collect the Dataset ID based off the name of the dataset you want to add to:

def get_dataset_id(name):
res_str = client.execute("""
query getDatasetByName($name: String!){
datasets(where: {name: $name}){
id
}
}
""", {"name": name})
res = json.loads(res_str)
return res["data"]["datasets"][0]["id"]


How did we do?