General concepts

Updated 1 week ago by Alex Cota

Overview

Here are some general concepts that are helpful to be aware of as you work with the Labelbox Python API.

Immediate updates on the server side

Each data update using object.update() on the client side immediately performs the same update on the server side. If the client side update does not raise an exception, you can assume that the update successfully passed on the server side.

Field caching

When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip to the server to get the field value you have already fetched. Server-side updates that happen after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.

Relationship fetching

Unlike fields, relationships are not cached. Relationships are fetched every time you call them. This is made explicit by defining relationships as callable methods on objects.

project.name       
project.datasets()

In many cases you may not be concerned with relationship data freshness because only you will only be modifying your data during small timeframes. In those situations, it is completely fine to keep references to related objects.

project_datasets = list(project.datasets())

Paginated fetching

Sometimes, a call to the server may result in a very large number of objects being returned. To prevent too many objects being returned at once, the Labelbox server API limits the number of returned objects. The Python API respects that limit and automatically paginates fetches. This is done transparently for you, but it has some implications.

projects = client.get_projects()
type(projects)
# PaginatedCollection
projects = list(projects)
type(projects)
# listproject = projects[0]
datasets = project.datasets()
type(datasets)
# PaginatedCollection
for dataset in datasets:
dataset.name

There are several points of interest in the code above.

  1. For both the top-level object fetch, client.getprojects(), and the relationship call, project.datasets(), a PaginatedCollection object is returned. This PaginatedCollection object takes care of the paginated fetching.
  2. Note that nothing is fetched immediately when the PaginatedCollection object is created.
  3. Round-trips to the server are made only as you iterate through a PaginatedCollection. In the code above that happens when a list is initialized with a PaginatedCollection, and when a PaginatedCollection is iterated over in a for loop.
  4. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing). You can only iterate over it.

Be careful about converting a PaginatedCollection into a list. This will cause all objects in that collection to be fetched from the server. In cases when you need only some objects (let's say the first 10 objects), it is much faster to iterate over the PaginatedCollection and simply stop once you're done.

The following code demonstrates how to do this.

data_rows = dataset.data_rows()
first_ten = []
for data_row in data_rows:
first_ten.append(data_row)
if len(first_ten) >= 10:
break

Was this section helpful? Give your feedback below.


How did we do?