General Concepts

Updated 2 weeks ago by Florijan Stamenkovic

General Python client API Concepts

The Labelbox Python client API aims to be an easy way for you to manage your Labelbox data. It attempts to implement functionalities in a consistent and sensible way. Following are some concepts you might want to be aware of.

Immediate Updates

Each data update on the Python client side immediately performs the same update on the server side. If the client side update does not raise an exception, you can assume that the update successfully passed on the server side. This includes:

  • Object creation such as client.create_project(...)
  • Field changes such as project.update(description="New description")
  • Relationship changes such as project.datasets.connect(dataset)
  • Deletions such as dataset.delete()

In case of field changes the client side object is updated as expected.

>>> dataset.name
"MyDataset"
>>> dataset.update(name="Other name")
>>> dataset.name
"Other name"

Field Caching

When you fetch an object from the server the client obtains all the field values for that object (for example the Dataset.name). When you access that field value (using for example my_dataset.name), the cached value is returned. There is no round-trip to the server to get the field value.

This implies that if a different process changes a field value on some object, you might not see that. There is no auto-propagation of server-side field values to the client-side.

Relationship Fetching

Unlike fields, relationships are not cached. They are fetched every time you call them. This is made explicit by defining relationships as callable methods on objects.

>>> project.field       # Accessed as an attribute
>>> project.datasets() # Relationship accessed as a method

In many cases you will not worry about relationship data freshness because only you will be modifying your data at a particular time. In those situations it's completely fine to keep references to related objects. They are, after all, just objects.

>>> project_datasets = list(project.datasets())

Paginated Fetching

In some cases a call to the server might result in a very large number of objects being returned. For example when getting DataRows for a Dataset.

To prevent very large results getting returned at once, the Labelbox server API puts a limit on the number of returned objects. The Python client respects that limit and automatically paginates fetches. This is done transparently for you, but it has some implications. Let's see what type object collections are.

>>> projects = client.get_projects()
>>> type(projects)
PaginatedCollection
>>> projects = list(projects)
>>> type(projects)
list
>>> project = projects[0]
>>> datasets = project.datasets()
>>> type(datasets)
PaginatedCollection
>>> for dataset in datasets:
>>> dataset.name

There are several points of interest in the code above. First, both the top-level object fetch (client.get_projects()) and a relationship call (project.dataset()) return a PaginatedCollection object. This object takes care of paginated fetching. Note that nothing is fetched immediately when that object is created. Round-trips to the server are made only as you iterate through a PaginatedCollection. In the code above that happens when a list is initialized with a PaginatedCollection, and when a PaginatedCollection is iterated over in a for-loop.

You can't get a count of objects in the relationship from a PaginatedCollection. You also can't access objects in it like in a list (using squared-bracket indexing). You can only iterate over it.

Be careful about converting a PaginatedCollection into a list. This will cause all of the objects in that collection to be fetched from the server. In cases when you need only some objects (let's say the first 10 objects), then iterate over the PaginatedCollection and simply stop once you're done. This is much faster.

>>> data_rows = dataset.data_rows()
>>> first_ten = []
>>> for data_row in data_rows:
>>> first_ten.append(data_row)
>>> if len(first_ten) >= 10:
>>> break


How did we do?