Data Rows

Updated 23 hours ago by Alex Cota

These are some common use cases people have when working with Data Rows in the Python API.

For the definition of a Data Row, see Overview & data types.

Before you start

Make sure the client is initialized.

from labelbox import Client
client = Client()

Bulk add Data Rows

For adding Data Rows individually, see Creating your first project.

Dataset.create_data_rows() accepts a list of items and is an asynchronous bulk operation. This is typically faster and avoids API limit issues. Unless you specify otherwise, your code will continue before the Data Rows are fully created on the server side. Paths to local files must be passed as a string.

  • external_id (string) - OPTIONAL
  • row_data (string) - REQUIRED
  • uid (ID) - not updatable
  • updated_at (DateTime) - not updatable
  • created_at (DateTime) - not updatable

The row_data argument should only be used when you are passing a path to an external URL. You must pass paths to external URLs as a dict.

dataset = client.get_dataset("dataset_id")
task = dataset.create_data_rows([{DataRow.row_data:"http://my_site.com/photos/img_01.jpg"},{DataRow.row_data:"http://my_site.com/photos/img_02.jpg"}])

Pass paths to local files as strings.

task = dataset.create_data_rows(["path/to/file1.jpeg", "path/to/file2.jpeg"])

You may also include paths to local files and paths to URLs at same time.

task = dataset.create_data_rows([{DataRow.row_data:"http://my_site.com/photos/img_01.jpg"},"path/to/file2.jpeg"])

You can (but don't have to) use task.wait_till_done() wait for the bulk creation to complete before continuing with other tasks. For more information on Tasks, see the API reference.

Add Data Rows from a JSON file

At a minimum, the JSON file you are using to import your Data Rows must include row_data as a key for each asset you wish to add.

[
{
"row_data": "<IMG_URL>"
},
{
"row_data": "<IMG_URL_2>"
},
{
"row_data": "path/to/file.jpg"
}
]

Use Dataset.create_data_rows to import Data Rows from your JSON file.

dataset = client.get_dataset("dataset_id")
with open("file.json") as f: dataset.create_data_rows(json.load(f))

This will return a task ID.

<Task ID: ck37lee3hyixi0725egxwooue>

Fetch Data Row by external ID

The optional external_id field of the Data Row object is additional to the uid field. People have found this additional external_id field useful when they need to map the Data Row to an asset in their own system.

Use the data_row_for_external_id method for getting a single Data Row by external_id within a dataset.

dataset = client.get_dataset("<dataset_unique_id>")
data_row = dataset.data_row_for_external_id(external_id)
If there are two Data Rows that have the same external ID, this method will raise a ResourceNotFoundError. See the API reference for more information.

Fetch multiple Data Rows

To fetch multiple Data Rows, use the data_rows method. This example demonstrates how to get a list of all Data Rows within a specified dataset.

dataset = client.get_dataset("<dataset_unique_id>")
for data_row in dataset.data_rows():
print(data_row)

Bulk delete Data Rows

To delete many Data Rows at a time, use the bulk_delete method. The code below demonstrates how to delete all Data Rows from a specified dataset if they were created after a certain date.

from labelbox import DataRow
dataset = client.get_dataset("<dataset_unique_id>")
data_rows = list(dataset.data_rows(where=DataRow.created_at > some_date))
DataRow.bulk_delete(data_rows)

Was this section helpful? Give your feedback below.


How did we do?