A new way to queue & review
A migration guide for Workflows, Batches, & the Data Rows tab.
Often, AI teams struggle to prioritize the right data to label and end up spending more money on data labeling than they should. A vital aspect of a data engine involves the creation of large volumes of high-quality training data.
This year, we've released two features – Batches and the Data Rows tab – to help teams better prioritize, queue, and manage their data. Now, we’re excited to introduce a third feature called Workflows.
All three features, Batches, the Data Rows tab, and Workflows, work to help teams label data more quickly and efficiently to create high-quality training data, faster.
What is the new way to queue & review my data?
For all new projects after the launch of Workflows, teams will need to use:
- Batches to queue data rows for labeling, configure data row priority, and set the number of labels (for Consensus projects)
- [New] Workflows for custom review
- The Data Rows tab to manage & view the status of data rows
Watch the video demo below to learn more about how teams can use Batches, Workflows, and the Data Rows tab for greater control over their labeling operations.
Batches
Batches will replace dataset-based queueing for all new projects.
While Batches will replace dataset-based queueing, datasets in Labelbox are not going away. In order to upload data, teams will still need to upload their relevant data as a dataset to Catalog. In Catalog, teams can use advanced search filters to surface relevant high-impact data for labeling. Once surfaced, teams will add the relevant data rows to their labeling project as a batch.
Machine learning teams often have tons of unlabeled data — it can be incredibly time consuming and expensive to label all of your data. An important question becomes how to smartly decide what data to label and prioritize in order to accelerate model development.
Rather than having to queue an entire dataset, batch-based queueing gives teams greater flexibility and control over what gets sent to a labeling project. With Batches, you can:
- Prioritize slices of data by adding batches to a project in priority
- Manage batches & view batch history
- Enable active learning workflows to identify the most high-impact data rows for labeling
We support batch-based queueing for both quality settings of Benchmarks and Consensus. Consensus settings will be configured at the batch-level – you’ll be prompted to set your coverage percentages and set the number of labels for a batch.
Learn more about how to create a batch and prioritize batches in this guide or read more in our documentation.
[New] Workflows
Workflows will replace the Review Step for all new projects.
The quicker you can review and iterate on training data cycles, the faster you can obtain high quality data and improve model performance.
At a glance, Workflows allows you to easily understand the state of your data within your labeling operations pipeline. Rather than relying on manual spreadsheets or ad-hoc methods, you have a systematic way of understanding how many data rows are ready for model training, how many are currently in-review, how many are being re-labeled, and how many haven’t been labeled yet.
Workflows works in conjunction with batches and the Data Rows tab to enable a more highly customizable, step-by-step review pipeline. It gives teams more granular control over how data rows get reviewed – teams can use Workflows to customize their review pipeline to drive efficiency and automation into their review process.
Here are a few ways Workflows can unlock greater functionalities to save your team labeling time and cost. Teams can now:
- Have multiple groups, like your core labeling team and subject matter experts, review labeled data in a specific manner by configuring custom review steps
- Select labels for rework individually or by bulk with minimal manual work
- Create ad-hoc review steps to ensure quality
- View a data row's history with the audit log for greater context
By default, there are four tasks that appear when you create a new project:
- Initial labeling task: reserved for all data rows that have been queued for labeling
- Initial review task: reserved for all data rows that are currently in the first review step
- Rework task: reserved for data rows that have been Rejected
- Done task: reserved for data rows that have a) moved through their qualified tasks in the workflow or b) did not qualify for any of the tasks
You can have up to 10 review tasks within a workflow.
You can learn more about the power of workflows and how to set up customized review sequences in this guide or in our documentation.
The Data Rows tab
The Data Rows tab will replace the Label tab for all new projects.
Over the Labels tab, the Data Rows tab supports more advanced filtering capabilities, so teams can easily find data rows and quickly understand data row status. Designed to work with batches and multi-step review workflows, the Data Rows tab gives you a holistic view of your labeling operations.
A holistic view of your project's progress
The new Overview page works with the Data Rows tab and Workflows to provide a holistic count of data rows in each stage of the workflow as well as a snapshot of annotation metrics.
- You can click into each stage of your workflow to open up the Data Rows tab where you can further inspect data row details for each stage.
- You can view annotation metrics and click into an annotation of interest to view specific slices of data in the Data Rows tab.
Drill into specific data rows at each step of your workflow
Use the panel on the left of the Data Rows tab to quickly understand what data rows belong to each stage of the workflow.
Filter and find specific slices of data
Use improved filters for more complex queries and to surface data rows of interest like annotation, metadata, dataset, media attribute, data row ID, and more. With flexible querying, you can search for more granular data with AND/OR conditions.
Manage batches and view batch history
When batches are submitted to a project, they'll appear in the Data Rows tab. By naming each batch, teams can keep track of when a batch was submitted, view the data rows within a batch, and remove a batch and any unlabeled data rows if needed.
Conduct actions in bulk to improve efficiency
Complex projects might feature a high number of data rows. It’s important that teams are able to effectively manage data rows of interest to improve labeling efficiency.
You can conduct actions in bulk by selecting bulk data rows together and completing one of the desired actions below:
- Move to a specific step of your workflow: Allows you to bulk move certain data rows to a review step or rework in your labeling workflow. For example, you can easily move bulk data rows to “Rework” or “Done” steps.
- Delete & re-queue: Allows you to send data rows back to the labeling queue and delete current labels. You also have the option to preserve the existing label as a template.
- Hide or unhide from labelers: Allows you to hide certain data rows from labelers (i.e sensitive or inappropriate content). This will restrict all labelers from being able to view selected data rows in the editor.
Learn more about the value of the Data Rows tab and in-depth tutorials in this guide. You can also read more about this change in our documentation.
What exactly is changing and when?
New projects
Labelbox will automatically be configuring new projects with Batches, Workflows, and the Data Rows tab.
This will happen on a rolling basis across our customer base. The table below indicates when you can expect to see this new paradigm (Batches, Workflows, and Data Rows tab) for new projects:
With these changes, when you create a new project:
Through the UI:
- Select a quality setting (benchmark or consensus)
- Rather than queueing an entire dataset, you’ll be directed to use batch-based queueing to send a subset of data rows for labeling
- If you select consensus as your quality mode during project creation, you’ll be able to set the consensus labeling parameters at the batch level (% coverage and # of labels)
SDK-related changes
Through the SDK:
- All new projects will require media type upon creation
- Select a queuing mode and quality mode at the time of project creation. You cannot change the queue or quality mode after project creation (learn how to set this up here)
- Rather than queueing an entire dataset, you’ll be directed to use batch-based queueing to send a subset of data rows for labeling.
- If you select consensus as your quality mode during project creation, you’ll be able to set the consensus labeling parameters at the batch level (% coverage and # of labels)
With the launch of Workflows and the move to batches, the Data Rows tab, and Workflows, we’ll be releasing new Python SDK versions.
Learn more about SDK changes in this guide.
Old projects
The migration of legacy projects to the new paradigm will take place on a rolling basis starting March 12th.
You will receive an email specifying your migration date. We will be sending out expected migration dates the week of Monday, March 6th – please check your email to see when your migration will take place.
What will happen to my old or existing projects?
Old or existing projects that were created before the launch of workflows still contain dataset-based queueing, the Labels tab, and the Review step.
Labelbox will automatically be migrating your old/existing projects to the new paradigm (batch-based queuing, Data Rows tab, and workflows). This will happen on Labelbox’s backend and no action is required on your end.
We will preserve all data rows, along with the associated labels and reviews (thumbs up/down) in the migrated projects. You will be able to query the review data in these projects and take actions required, if any, in the Workflow paradigm at your discretion, including moving data rows to the appropriate workflow task.
Learn more about the upcoming migration in our documentation.
Why are we making this change?
Having worked with hundreds of AI teams, we recognized the need for more granular control over labeling workflows.
In order to streamline and improve the creation, maintenance, and quality control of data rows, we’re moving to using Batches, the Data Rows tab, and Workflows as a new way for teams to queue & review their data.
Where can I learn more?
We understand that change is never easy. We'll be continuing to update this guide with additional information throughout the migration.
If you have specific questions related to new project creation or the migration of old projects, you can create a support ticket or reach out to your dedicated Customer Success Manager.
In the meantime, here are a few resources to learn more:
Guides
- How to prepare and submit a batch for labeling (Batches)
- How to customize your annotation review process (Workflows)
- How to search, surface, and prioritize data within a project (Data Rows tab)
- SDK changes: A new way to queue & review
Documentation
- Batches
- Workflows
- Data Rows tab
- Migrating to Workflows
- Features that are being deprecated due to this migration
Where can I provide feedback?
As we strive to make the Labelbox experience even better, we value any feedback.
- If you'd like to leave feedback on your team's current review & rework process, you can fill out this survey.
- If you have any feedback or questions on the current migration to batches, the Data Rows tab, and Workflows, feel free to fill out this short survey.