Labelbox•March 28, 2023
This month, we released several updates that help you explore and organize your data, label faster than ever, and quickly identify model errors. You can now upload custom embeddings, label video data with step interpolation over linear interpolation, simplify your workflow with SDK improvements, and send model predictions as pre-labels to your labeling project. Read on to learn more about the latest improvements to help you iteratively interact with your data and build better models.
An embedding is a technique used to represent data, such as words or images, as vectors of numerical values in a high-dimensional space. Embeddings can make data exploration easier and can help you quickly surface similar data of interest.
You can now upload up to 100 custom embeddings per organization in Catalog. To learn more about how to upload custom embeddings, you can check out the following resources:
In addition to uploading custom embeddings, when you connect your data to Labelbox, we automatically compute off-the-shelf embeddings on your data. This includes:
While these off-the-shelf embeddings are a useful starting point for you to explore your data and conduct similarity searches, there might be some cases where you want to use your own custom embeddings. You can easily compare the results of these custom and provided off-the-shelf embeddings in Labelbox to discover the best embeddings to use for data selection.
We have improved SDK capabilities to extend the feature updates offered in Labelbox UI. Version 3.40.0 includes notable functional improvements across global keys, model run, ontology, batches, and annotation import to help you better access all of the functionalities of the Labelbox API.
These improvements include:
A new way to export your data
Simplify annotation imports (pre-labels, ground truth, model predictions) with global keys
All Labelbox users can leverage global keys to streamline the annotation import experience and to surface specific data rows in Catalog faster than ever.
Global keys are helpful in avoiding duplicate data in Catalog, as each global key is unique to one data row at the organization level. Rather than waiting for your data rows to process and manually having to export and match IDs, global keys facilitate a streamlined workflow.
With this SDK update, you can now:
To learn more about how to upload annotations with global keys, check out our documentation.
You can also click to see the annotation import workflow based on the data type of interest: Image | Video | Text | Geospatial | Documents | Conversational
Rather than linearly tracking objects to the next keyframe in videos, we’re offering the option for users to label videos with step interpolation.
While linear interpolation can be helpful in use cases where you want to linearly and smoothly track an object in your video, there are some use cases in which you want the annotation to stay in the exact same position until the next keyframe.
With step interpolation:
For now, this is only available at the organization level, meaning you can enable step interpolation and turn off linear interpolation for video projects across your organization. We’re currently working on making this feature at the project and feature level.
If you wish to disable linear interpolation and replace it with step interpolation for your organization, please contact jpatel@labelbox.com.
We’ve introduced a few core architectural improvements to enhance the labeling experience and data ingestion capabilities:
Labelbox model-assisted labeling (MAL) workflows allow you to import computer-generated predictions — or annotations created outside of Labelbox — as pre-labels. Although they’ll still require human review, these imported annotations help decrease human resource demand and expedite human labeling workflows.
Evaluating how your model performs on these predictions will help identify and target new edge cases and data discrepancies where your model has demonstrated poor performance and thus needs improvement or fine-tuning.
We’ve implemented new features to empower more human-in-the-loop engagement towards identifying the data necessary to streamline annotations and boost model performance.
To do uncertainty sampling where you prioritize low-confidence predictions, you can: