If you already leverage Labelbox to enrich and label your unstructured data, you know how important it is to export your data insights in the right format and connect it with your downstream data workflow. Whether you want to store your data in a database, a cloud-hosted table, an ML training pipeline, or a production environment, you need a flexible and powerful export system that can handle your specific needs.
That’s why we’re excited to introduce a new way to export your data. This new system gives you more granular control over your data exports across the Labelbox platform and SDK. With this new way to export, you can:
While we will continue supporting the Export v1 system until December 31, 2023, we encourage you to gradually start migrating all of your export workflows to this updated export workflow (Export v2). The Export v1 and Export v2 workflows may be used in tandem until Export v1 is sunset on December 31st.
Please refer to our documentation to learn more about export specifications and compare the old Export v1 and new Export v2 systems: image | video | text | documents | geospatial/tiled imagery | audio | conversational text | HTML | DICOM
The previous export system (Export v1) relied upon a less flexible label-centric export that limited access to all the information you might need about a Data Row. Within Export v2’s Data Row-centric context, you can access much more information — including fields like:
By reframing exports based on Data Rows, we’ve made it much more intuitive for you to integrate with your data tables that are organized around your team’s unique assets and data rows.
A data row can have labels from multiple projects, or have predictions from multiple model runs. Using this new way to export through Catalog, or through the SDK, you can easily grab all the information about a Data Row.
Use data row filters to select a subset of data rows for export:
Export v1 used to cache exports for 30 minutes. In Export v2, you will always get a fresh export and you can run one export asynchronous task on a project at a time.
From Labelbox's UI, you can access the export function through the drop-down menu after selecting a subset of data rows. You can export the entire project, model, dataset, or slice from a set of filters or a selection of data rows within them.
Below are some examples of Export v2 in action. For more detailed information, please refer to our documentation.
For developers that would like to programmatically feed exports directly into downstream data workflows or build automatic workflows to retrieve fresh data exports on a regular basis, we recommend Export v2 SDK. It provides flexibility to control what data you want to export.
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"performance_details": True
}
# You can set the range for last_activity_at and label_created_at. You can also set a list of data
# row ids to export.
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.
# Note: This is an AND logic between the filters, so usually using one filter is sufficient.
filters= {
"last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"data_row_ids": ["data_row_id_1", "data_row_id_2"]
}
export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
You can check out SDK examples of exporting from datasets, slices, and model runs in this documentation.
For teams that would like to get near real-time updates for each change on a data row, we recommend webhooks as a better option. Export v2 format can now be used for webhooks to receive the following events from project:
You can configure a webhook that returns Export v2 in Project whenever an event is triggered. See more details in this Webhook Guide.
For example, you can use ngrok to expose a local port.
ngrok http 3001
This will generate an address that looks like ` https://887d-2601-645-8000-3a90-9cb4-7d1b-d9b4-6714.ngrok.io` and it will forward all requests to your localhost:3001.
In your terminal, create a python file that contains the following code to receive webhook payload. Make sure to change your secret.
from flask import Flask, request
import hmac, hashlib
import json
import threading
from werkzeug.serving import run_simple
# This can be any secret that matches your webhook config (we will set later)
secret = b"CHANGE-ME"
# Example for server-side code to receive webhook events
app = Flask(__name__)
@app.route("/webhook-endpoint", methods=["POST"])
def print_webhook_info():
payload = request.data
computed_signature = hmac.new(secret, msg=payload,
digestmod=hashlib.sha1).hexdigest()
if request.headers["X-Hub-Signature"] != "sha1=" + computed_signature:
print(
"Error: computed_signature does not match signature provided in the headers"
)
return "Error", 500, 200
print("=========== New Webhook Delivery ============")
print("Delivery ID: %s" % request.headers["X-Labelbox-Id"])
print("Event: %s" % request.headers["X-Labelbox-Event"])
print("Payload: %s" %
json.dumps(json.loads(payload.decode("utf8")), indent=4))
return "Success"
thread = threading.Thread(target=lambda: run_simple("0.0.0.0", 3001, app))
thread.start()
Then run this script to start receiving requests from the ngrok address:
ngrok http 3001
Now, you can configure a webhook in Labelbox's Project setting.
Now that you've created a webhook, everytime there is a new event triggered (such as updating a label), you will receieve the payload at / webhook-endpoint.
The improved and datarow-centric format of Export v2 empowers you to export with more granularity by including or excluding variables based on your project’s unique needs. Offering a more seamless user experience, the new export format more consistently mirrors our import format and aligns with annotation schema available in the platform.
As you migrate from Export v1 to Export v2 workflows, please refer to our documentation for more detailed instructions on how to export your data through the UI or through the Python SDK.