Table of Contents

Data Rows

Alex Cota Updated by Alex Cota

Below are some frequently used methods for Data Rows. For a complete list of methods see the API reference.

A Data Row represents an Asset or collection of Assets (including associated metadata) and all associated Label information.

Before you start
  1. Complete the installation and authentication steps.
  2. Make sure the API client is initialized:
from labelbox import Client
client = Client()

Add a Data Row

Use the create_data_row method to add individual files to a Dataset. This is a synchronous operation.

dataset = client.get_dataset("<dataset_id>")
data_row = dataset.create_data_row(row_data="http://my_site.com/photos/img_01.jpg")

You can also pass a string to a local file.

data_row = dataset.create_data_row(row_data="path/to/file.jpg")

Bulk add Data Rows

Use the create_data_rows method to pass a list of items. This is an asynchronous bulk operation that works for imagesvideo, and text data.

Adding Data Rows in bulk is typically faster and avoids API limit issues. Unless you specify otherwise, your code will continue before the Data Rows are fully created on the server side.
Option 1: Import URLs

This code sample imports URLs as a dict.

dataset = client.get_dataset("<DATASET_ID>")
task = dataset.create_data_rows([{labelbox.schema.data_row.DataRow.row_data:"https://storage.googleapis.com/labelbox-sample-datasets/Videos/shibuya-short/shibuya-1130-1150.mp4"},{labelbox.schema.data_row.DataRow.row_data:"https://storage.googleapis.com/labelbox-sample-datasets/Videos/shibuya-short/shibuya-1370-1390.mp4"}])

Where:

  • DataRow.row_data is REQUIRED and accepts an https:// URL to an external image, video, or text file.
Option 2: Import local files

This code sample imports 2 local image files.

task = dataset.create_data_rows(["path/to/file1.jpg", "path/to/file2.jpg"])

Where:

  • Local file paths are passed as strings. Works for images, videos, and text.
Option 3: import URLs and local files

This code sample imports a local text file a path to an external text file.

task = dataset.create_data_rows([{labelbox.schema.data_row.DataRow.row_data:"https://storage.googleapis.com/labelbox-sample-datasets/Videos/shibuya-short/shibuya-1050-1070.mp4"},"path/to/file2.mp4"])
task = dataset.create_data_rows([{labelbox.schema.data_row.DataRow.row_data:"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."},{labelbox.schema.data_row.DataRow.row_data:"https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt"},"path/to/file2.txt"])

Where:

  • You may also pass paths to local files and https:// URLs at same time.
You can (but don't have to) use task.wait_till_done() wait for the bulk creation to complete before continuing with other tasks. For more information on Tasks, see the API reference.

Add URLs via JSON

  1. Your import file should follow the format as this sample JSON:
[
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
"external_id": "lorem-ipsum.txt"
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Images/weed_detection/agri_0_39.jpeg"
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Videos/shibuya-short/shibuya-1370-1390.mp4",
"external_id": "shibuya-1370-1390.mp4"
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Videos/shibuya-short/shibuya-1050-1070.mp4"
}
]

Where:

  • row_data is REQUIRED and accepts a public URL.
  • external_id is OPTIONAL and allows you to specify the file name.
  1. Then, use create_data_rows() to import data from your local JSON file.
dataset = client.get_dataset("<dataset_id>")
with open("file.json") as f: dataset.create_data_rows(json.load(f))

This will return a task ID.

<Task ID: ck37lee3hyixi0725egxwooue>

Fetch Data Row by external ID

To fetch a single Data Row by external_id, use the data_row_for_external_id method.

dataset = client.get_dataset("<dataset_id>")
data_row = dataset.data_row_for_external_id("<external_id>")
If there are two Data Rows that have the same external ID, this method will raise a ResourceNotFoundError. See the API reference for more information.

Fetch all Data Rows in a Dataset

Use the data_rows method to fetch multiple Data Rows, then iterate over the PaginatedCollection object.

dataset = client.get_dataset("<dataset_id>")
for data_row in dataset.data_rows():
print(data_row)

Bulk delete Data Rows

Use the bulk_delete method to delete multiple Data Rows at a time. The sample below demonstrates how to delete all Data Rows created after a certain date.

from labelbox import DataRow

dataset = client.get_dataset("<dataset_id>")
data_rows = list(dataset.data_rows(where=DataRow.created_at > some_date))
DataRow.bulk_delete(data_rows)

Was this page helpful?

Datasets

Import annotations

Contact