A migration guide for Workflows, Batches, & the Data Rows tab.
Often, AI teams struggle to prioritize the right data to label and end up spending more money on data labeling than they should. A vital aspect of a data engine involves the creation of large volumes of high-quality training data.
This year, we've released two features – Batches and the Data Rows tab – to help teams better prioritize, queue, and manage their data. Now, we’re excited to introduce a third feature called Workflows.
All three features, Batches, the Data Rows tab, and Workflows, work to help teams label data more quickly and efficiently to create high-quality training data, faster.
For all new projects after the launch of Workflows, teams will need to use:
Watch the video demo below to learn more about how teams can use Batches, Workflows, and the Data Rows tab for greater control over their labeling operations.
Batches will replace dataset-based queueing for all new projects.
While Batches will replace dataset-based queueing, datasets in Labelbox are not going away. In order to upload data, teams will still need to upload their relevant data as a dataset to Catalog. In Catalog, teams can use advanced search filters to surface relevant high-impact data for labeling. Once surfaced, teams will add the relevant data rows to their labeling project as a batch.
Machine learning teams often have tons of unlabeled data — it can be incredibly time consuming and expensive to label all of your data. An important question becomes how to smartly decide what data to label and prioritize in order to accelerate model development.
Rather than having to queue an entire dataset, batch-based queueing gives teams greater flexibility and control over what gets sent to a labeling project. With Batches, you can:
We support batch-based queueing for both quality settings of Benchmarks and Consensus. Consensus settings will be configured at the batch-level – you’ll be prompted to set your coverage percentages and set the number of labels for a batch.
Learn more about how to create a batch and prioritize batches in this guide or read more in our documentation.
Workflows will replace the Review Step for all new projects.
The quicker you can review and iterate on training data cycles, the faster you can obtain high quality data and improve model performance.
At a glance, Workflows allows you to easily understand the state of your data within your labeling operations pipeline. Rather than relying on manual spreadsheets or ad-hoc methods, you have a systematic way of understanding how many data rows are ready for model training, how many are currently in-review, how many are being re-labeled, and how many haven’t been labeled yet.
Workflows works in conjunction with batches and the Data Rows tab to enable a more highly customizable, step-by-step review pipeline. It gives teams more granular control over how data rows get reviewed – teams can use Workflows to customize their review pipeline to drive efficiency and automation into their review process.
Here are a few ways Workflows can unlock greater functionalities to save your team labeling time and cost. Teams can now:
By default, there are four tasks that appear when you create a new project:
You can have up to 10 review tasks within a workflow.
You can learn more about the power of workflows and how to set up customized review sequences in this guide or in our documentation.
The Data Rows tab will replace the Label tab for all new projects.
Over the Labels tab, the Data Rows tab supports more advanced filtering capabilities, so teams can easily find data rows and quickly understand data row status. Designed to work with batches and multi-step review workflows, the Data Rows tab gives you a holistic view of your labeling operations.
A holistic view of your project's progress
The new Overview page works with the Data Rows tab and Workflows to provide a holistic count of data rows in each stage of the workflow as well as a snapshot of annotation metrics.
Drill into specific data rows at each step of your workflow
Use the panel on the left of the Data Rows tab to quickly understand what data rows belong to each stage of the workflow.
Filter and find specific slices of data
Use improved filters for more complex queries and to surface data rows of interest like annotation, metadata, dataset, media attribute, data row ID, and more. With flexible querying, you can search for more granular data with AND/OR conditions.
Manage batches and view batch history
When batches are submitted to a project, they'll appear in the Data Rows tab. By naming each batch, teams can keep track of when a batch was submitted, view the data rows within a batch, and remove a batch and any unlabeled data rows if needed.
Conduct actions in bulk to improve efficiency
Complex projects might feature a high number of data rows. It’s important that teams are able to effectively manage data rows of interest to improve labeling efficiency.
You can conduct actions in bulk by selecting bulk data rows together and completing one of the desired actions below:
Learn more about the value of the Data Rows tab and in-depth tutorials in this guide. You can also read more about this change in our documentation.
Labelbox will automatically be configuring new projects with Batches, Workflows, and the Data Rows tab.
This will happen on a rolling basis across our customer base. The table below indicates when you can expect to see this new paradigm (Batches, Workflows, and Data Rows tab) for new projects:
With these changes, when you create a new project:
Through the UI:
Through the SDK:
With the launch of Workflows and the move to batches, the Data Rows tab, and Workflows, we’ll be releasing new Python SDK versions.
Learn more about SDK changes in this guide.
The migration of legacy projects to the new paradigm will take place on a rolling basis starting March 12th.
You will receive an email specifying your migration date. We will be sending out expected migration dates the week of Monday, March 6th – please check your email to see when your migration will take place.
Old or existing projects that were created before the launch of workflows still contain dataset-based queueing, the Labels tab, and the Review step.
Labelbox will automatically be migrating your old/existing projects to the new paradigm (batch-based queuing, Data Rows tab, and workflows). This will happen on Labelbox’s backend and no action is required on your end.
We will preserve all data rows, along with the associated labels and reviews (thumbs up/down) in the migrated projects. You will be able to query the review data in these projects and take actions required, if any, in the Workflow paradigm at your discretion, including moving data rows to the appropriate workflow task.
Learn more about the upcoming migration in our documentation.
Having worked with hundreds of AI teams, we recognized the need for more granular control over labeling workflows.
In order to streamline and improve the creation, maintenance, and quality control of data rows, we’re moving to using Batches, the Data Rows tab, and Workflows as a new way for teams to queue & review their data.
We understand that change is never easy. We'll be continuing to update this guide with additional information throughout the migration.
If you have specific questions related to new project creation or the migration of old projects, you can create a support ticket or reach out to your dedicated Customer Success Manager.
In the meantime, here are a few resources to learn more:
Guides
Documentation
As we strive to make the Labelbox experience even better, we value any feedback.