Here are some general concepts that are helpful to be aware of as you work with the Labelbox Python API.
Immediate updates on the server side
Each data update using
object.update() on the client side immediately performs the same update on the server side. If the client side update does not raise an exception, you can assume that the update successfully passed on the server side.
When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip to the server to get the field value you have already fetched. Server-side updates that happen after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.
Unlike fields, relationships are not cached. Relationships are fetched every time you call them. This is made explicit by defining relationships as callable methods on objects.
In many cases you may not be concerned with relationship data freshness because only you will only be modifying your data during small timeframes. In those situations, it is completely fine to keep references to related objects.
project_datasets = list(project.datasets())
Sometimes, a call to the server may result in a very large number of objects being returned. To prevent too many objects being returned at once, the Labelbox server API limits the number of returned objects. The Python API respects that limit and automatically paginates fetches. This is done transparently for you, but it has some implications.
projects = client.get_projects()
projects = list(projects)
# listproject = projects
datasets = project.datasets()
for dataset in datasets:
There are several points of interest in the code above.
- For both the top-level object fetch,
client.getprojects(), and the relationship call,
PaginatedCollectionobject is returned. This
PaginatedCollectionobject takes care of the paginated fetching.
- Note that nothing is fetched immediately when the
PaginatedCollectionobject is created.
- Round-trips to the server are made only as you iterate through a
PaginatedCollection. In the code above that happens when a
listis initialized with a
PaginatedCollection, and when a
PaginatedCollectionis iterated over in a for loop.
- You cannot get a count of objects in the relationship from a
PaginatedCollectionnor can you access objects within it like you would a list (using squared-bracket indexing). You can only iterate over it.
Be careful about converting a
PaginatedCollection into a
list. This will cause all objects in that collection to be fetched from the server. In cases when you need only some objects (let's say the first 10 objects), it is much faster to iterate over the
PaginatedCollection and simply stop once you're done.
The following code demonstrates how to do this.
data_rows = dataset.data_rows()
first_ten = 
for data_row in data_rows:
if len(first_ten) >= 10: