A migration guide for Workflows, Batches, & the Data Rows tab.
Often, AI teams struggle to prioritize the right data to label and end up spending more money on data labeling than they should. A vital aspect of a data engine involves the creation of large volumes of high-quality training data.
This year, we've released two features – Batches and the Data Rows tab – to help teams better prioritize, queue, and manage their data. Now, we’re excited to introduce a third feature called Workflows.
All three features, Batches, the Data Rows tab, and Workflows, work to help teams label data more quickly and efficiently to create high-quality training data, faster.
For all new projects after the launch of Workflows, teams will need to use:
Watch the video demo below to learn more about how teams can use Batches, Workflows, and the Data Rows tab for greater control over their labeling operations.
While Batches will replace dataset-based queueing, datasets in Labelbox are not going away. In order to upload data, teams will still need to upload their relevant data as a dataset to Catalog. In Catalog, teams can use advanced search filters to surface relevant high-impact data for labeling. Once surfaced, teams will add the relevant data rows to their labeling project as a batch.
Machine learning teams often have tons of unlabeled data — it can be incredibly time consuming and expensive to label all of your data. An important question becomes how to smartly decide what data to label and prioritize in order to accelerate model development.
Rather than having to queue an entire dataset, batch-based queueing gives teams greater flexibility and control over what gets sent to a labeling project. With Batches, you can:
We support batch-based queueing for both quality settings of Benchmarks and Consensus. Consensus settings will be configured at the batch-level – you’ll be prompted to set your coverage percentages and set the number of labels for a batch.
The quicker you can review and iterate on training data cycles, the faster you can obtain high quality data and improve model performance.
At a glance, Workflows allows you to easily understand the state of your data within your labeling operations pipeline. Rather than relying on manual spreadsheets or ad-hoc methods, you have a systematic way of understanding how many data rows are ready for model training, how many are currently in-review, how many are being re-labeled, and how many haven’t been labeled yet.
Workflows works in conjunction with batches and the Data Rows tab to enable a more highly customizable, step-by-step review pipeline. It gives teams more granular control over how data rows get reviewed – teams can use Workflows to customize their review pipeline to drive efficiency and automation into their review process.
Here are a few ways Workflows can unlock greater functionalities to save your team labeling time and cost. Teams can now:
By default, there are four tasks that appear when you create a new project:
You can have up to 10 review tasks within a workflow.
Over the Labels tab, the Data Rows tab supports more advanced filtering capabilities, so teams can easily find data rows and quickly understand data row status. Designed to work with batches and multi-step review workflows, the Data Rows tab gives you a holistic view of your labeling operations.
A holistic view of your project's progress
The new Overview page works with the Data Rows tab and Workflows to provide a holistic count of data rows in each stage of the workflow as well as a snapshot of annotation metrics.
Drill into specific data rows at each step of your workflow
Use the panel on the left of the Data Rows tab to quickly understand what data rows belong to each stage of the workflow.
Filter and find specific slices of data
Use improved filters for more complex queries and to surface data rows of interest like annotation, metadata, dataset, media attribute, data row ID, and more. With flexible querying, you can search for more granular data with AND/OR conditions.
Manage batches and view batch history
When batches are submitted to a project, they'll appear in the Data Rows tab. By naming each batch, teams can keep track of when a batch was submitted, view the data rows within a batch, and remove a batch and any unlabeled data rows if needed.
Conduct actions in bulk to improve efficiency
Complex projects might feature a high number of data rows. It’s important that teams are able to effectively manage data rows of interest to improve labeling efficiency.
You can conduct actions in bulk by selecting bulk data rows together and completing one of the desired actions below:
Labelbox will automatically be configuring new projects with Batches, Workflows, and the Data Rows tab.
This will happen on a rolling basis across our customer base. The table below indicates when you can expect to see this new paradigm (Batches, Workflows, and Data Rows tab) for new projects:
With these changes, when you create a new project:
Through the UI:
Through the SDK:
With the launch of Workflows and the move to batches, the Data Rows tab, and Workflows, we’ll be releasing new Python SDK versions.
Learn more about SDK changes in this guide.
We will begin migrating old projects to the new batches + data row tab + workflow paradigm starting on 3/31/23. Soon, we will be sharing a migration schedule for each customer tier. For now, no action is required for your old projects.
The table below indicates the planned migration deadlines for all old projects to be automatically migrated to use Batches, the Data Rows tab, and Workflows.
Having worked with hundreds of AI teams, we recognized the need for more granular control over labeling workflows.
In order to streamline and improve the creation, maintenance, and quality control of data rows, we’re moving to using Batches, the Data Rows tab, and Workflows as a new way for teams to queue & review their data.
We understand that change is never easy. We'll be continuing to update this guide with additional information throughout the migration.
If you have specific questions related to new project creation or the migration of old projects, you can create a support ticket or reach out to your dedicated Customer Success Manager.
In the meantime, here are a few resources to learn more:
As we strive to make the Labelbox experience even better, we value any feedback.