Table of Contents

Model-assisted labeling

Alex Cota Updated by Alex Cota

The model-assisted labeling workflow in the new Image editor allows you to import computer-generated predictions and load them as editable Features on an asset. This can be useful for speeding up the labeling process and supporting human labeling efforts.

Model-assisted labeling workflow

Model-assisted labeling supports the following label types:

  • masks
  • bounding boxes
  • polygons
  • polylines
  • points

Before you start

  1. Make sure you have the proper authentication.
  2. Create a project.
  3. Create your dataset and attach data rows.
  4. Use this query to get the IDs for your data rows.
query GetDataRows{ 
dataset(
where:{
id: "<DATASET_ID>"
}
){
dataRows(first:100){
id
}
}
}

For a code sample, reference the end-to-end example in Project setup.

Bulk import predictions

Step 1: Create the NDJSON file

Labelbox expects the import file to be in newline delimited format (see the NDJSON specification for more detail). According to the NDJSON standard, each line in the file will represent a prediction on a data row. A data row can have more than one prediction.

Additionally, the import file must follow the GeoJSON standard. According to this standard, each prediction entry will be represented as a feature within a FeatureCollection. This is not to be confused with the Labelbox definition of a “feature”, which is a single annotation on an asset (e.g. object or classification). To avoid confusion, the two uses of the term “feature” will be differentiated by the way they are written in this document:

  • feature: Represents a prediction in the import NDJSON file.
  • Feature: A single annotation on an asset (eg. classification or object). References a Feature schema in the ontology.

Importing vector predictions This section applies to the following GeoJSON types.

Field

Definition

type

Value is "FeatureCollection". Represents prediction collection.

properties.dataRow.id

ID of the data row provided by Labelbox. ExternalId is not yet supported.

features.type

Value is "Feature". Represents prediction.

features.properties.schema.id

ID of the Feature schema defined in the ontology. To get this value, use this query and copy the value for featureSchemaId:

query { 
project (where:
{id: "ck5r5laiahj9h0a561xrcnv6b"}
) {
ontology {
normalized }
}
}

features.properties.uuid

User-generated UUID for each prediction. See the example below for a sample uuid.

features.geometry.type

Value will be one of the following:

"Polygon"

"MultiPolygon"

"Point"

"LineString"

features.geometry.coordinates

An array of points that determine the feature geometry.

Click here to see an example NDJSON import file.

#bounding box NDJSON example
{
"type": "FeatureCollection",
"properties": {
"dataRow": {
"id": "dr2"
}
},
"features": [
{
"type": "Feature",
"properties": {
"schema": {
"id": "bboxSchema2"
},
"uuid": "414138d6-bf6e-4061-9ae9-7362ce8864d2"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[[[10,10],[10,200],[200,200],[200, 10],[10,10]]]
]
}
}
]
}

Importing mask predictions

The NDJSON file for masks will not have geometry coordinates and, therefore, will not have a geometry type. Instead, the properties field will expand to contain two additional fields to specify the url and the color of the mask.

Note: For data rows with multiple masks, each mask color specified in the import file should correspond to a Feature schema defined in your project’s ontology. If you pass a URL/color pair where the color doesn’t exist in the image, no feature (prediction) will be created and you will get an error message in the output file.

Before you import, make sure the mask and data row dimensions match.

Field

Definition

type

Value is "FeatureCollection". Represents prediction collection.

properties.dataRow.id

ID of the data row provided by Labelbox. ExternalId is not yet supported.

features.type

Value is "Feature". Represents prediction.

features.properties.schema.id

ID of the Feature schema defined in the project’s ontology. To get this value, use this query and copy the value for featureSchemaId:

query { 
project (where:
{id: "ck5r5laiahj9h0a561xrcnv6b"}
) {
ontology {
normalized }
}
}

features.properties.uuid

User-generated UUID for each prediction. See the example below for a sample uuid.

features.properties.url

Mask URL. If you are importing multiple mask predictions on one data row, each mask should reference the same URL.

features.properties.color

An array of RGB values from 0 to 255 that indicates which color represents each given mask. Only 3-channel RGB colors is supported.

Click here to see a full example of an NDJSON import file.

#Mask NDJSON example
{
"type": "FeatureCollection",
"properties": {
"dataRow": {
"id": "dr4"
}
},
"features": [
{
"type": "Feature",
"properties": {
"mask": {
"url": "https://storage.googleapis.com/labelbox-public-bucket/test.png",
"color": [
0,
0,
0
]
},
"schema": {
"id": "maskSchema1"
},
"uuid": "1e5de470-fafb-4d94-bb09-f4d2895ab44b"
}
}
]
}

Step 2: Import the NDJSON file
Please limit the total number of imported features to under 500k features per customer across all of your projects.

Approximate import time estimates:

Total features

Vector import time

Mask import time

1,000,000

20m

6d

100,000

5m 30s

12h

10,000

3-5m

1h 30m

1000

3-5m

10m

100

3-5m

3-5m

Once you have properly formatted your NDJSON import file, create a URL for your file. Then, use the GraphQL mutation below to create a bulkImportRequest. Each bulkImportRequest should have a unique name per project.

mutation { 
createBulkImportRequest(data: {
projectId: "ck5r3rfav005h0887x584pwak",
name: "test2",
fileUrl: "https://foobar.com"}) {
id
}
}

To check the status of the import job, pass this query. Every time you query for outputFileURL it returns a URL that expires after 1 day. To get another URL, run the bulkImportRequest query again.

query { 
bulkImportRequest(where: {
projectId: "ck5r3rfav005h0887x584pwak",
name: "test2"}) {
id
name
inputFileUrl
outputFileUrl
state
}
}

The state field returned by the above query will be one of:

RUNNING
FAILED
SUCCESS

If the whole pipeline fails, the FAILED state is returned and the output file will contain JSON with only an error field containing the error message.

If at least one prediction in the pipeline is successfully imported, the SUCCESS state is returned. Since the SUCCESS state is not indicative that every prediction was imported successfully, the output file will contain error descriptions for each failed data row as well as the following fields in NDJSON format.

lineNumber

Indicates the data row that failed.

error

General error message.

dataRowId (OPTIONAL)

ID of the data row containing the failed prediction. Will only appear if given by the user.

predictions.predictionNumber

Indicates the prediction on the data row that failed.

predictions.error

Indicates the error message for the failed prediction.

predictions.uuid (OPTIONAL)

Identifies failed prediction. Will only appear if given by the user.

Note: If no uuid or dataRowId is properly given by the user, the error will be located by lineNumber or predictionNumber.

# Example error message
[
{
"lineNumber": 1,
"error": "Failed to import 1 prediction(s)",
"dataRowId": "dr2",
"predictions": [
{
"predictionNumber": 2,
"error": "Field 'uuid' not found"
}
]
},
{
"lineNumber": 3,
"error": "Failed to import 1 prediction(s)",
"dataRowId": "dr4",
"predictions": [
{
"predictionNumber": 3,
"error": "Color [255, 255, 0] not found on image from URL https://storage.googleapis.com/labelbox-public-bucket/test.png",
"uuid": "2d29f68a-7cab-45c3-9761-646d58d00ee0"
}
]
},
{
"lineNumber": 4,
"error": "Error while parsing FeatureCollection"
}
]

For predictions not uploaded successfully the first time, fix the errors indicated in the outputFileUrl and start another bulk import with the amended input file. Labelbox will identify the failed predictions by the data row lineNumber and will retry the upload of the failed data rows.

If you label an asset then delete the label and keep the label as a template, that label template will take precedence over any model-assisted labels (predictions) you import for that asset.

Load the predictions in the Image editor

After you have successfully imported your predictions, go to your Labelbox account, navigate to “Settings” > “Automation”, and turn on Model-assisted labeling. When this is on, you will be able to view predictions in the labeling interface. Only project admins can toggle Model-assisted labeling on and off.

Note: Bulk import is not supported in the legacy image editor.

Model-assisted labeling toggle on

When an asset is loaded in the Image editor, any predictions for that asset will show up as editable Features for the user.

Segmentation in label interface

Predictions will be loaded on an asset only when the following conditions are met:

  • Model-assisted labeling toggle is on.
  • There are predictions created for the data rows.
  • There are no non-prediction annotations that have already been created by the user on the data rows.

Duration (Timer) The timer functionality for this early release is still a work in progress. Currently, if the labeler skips or submits a label without making any changes first, the timer will not start and the duration time recorded will be 0 seconds. This logic may be revised in a future update.

Update/delete predictions

When a labeler chooses to “skip” an asset with predictions, the predictions will get deleted from the asset. Next time a user loads the asset in the labeling interface, the predictions will be loaded again. Overwriting predictions on the data row will not automatically overwrite any Features already generated from the original predictions. To update Features generated from predictions, you will need to delete the labels from the data row in the UI, update the predictions on the data row, and load the asset again to see the updated predictions.

Was this page helpful?

Contact