logo
×

How to search, surface, and prioritize data within a project

A vital aspect of a strong data engine involves the creation of large volumes of high-quality training data.

A common bottleneck for many AI teams is the inability to have a holistic view across their data to filter and prioritize the right data to label. Resembling a black-box approach to labeling operations, this lack of insight and visibility can cause teams to spend more time and cost on labeling.

The better you’re able to search, understand, and manage your data, the faster you’ll be able to prioritize Data Rows for labeling and accelerate model development.


What is the Data Rows tab?

The Data Rows tab is the central hub for all Data Rows within a given project. It allows you to view, manage, and filter for Data Rows within your project.

With the Data Rows tab, teams have a holistic view of:

  • The status and number of Data Rows across: To Label, Labeled, In Review, In Rework, and Done
  • A data row’s Data Row ID / External ID / Global Key
  • What review task it belongs to
  • The quality setting and agreement score

Benchmark - shows the agreement score

Consensus - shows the number of labels completed and the agreement score

  • Label & Review time
  • What dataset a data row belongs to
  • Open issues on the data row

You can sort Data Rows by:

  • Data Rows created at
  • Label created at
  • Label updated at
  • Benchmark agreement score
  • Label time (coming soon)
  • Review time (coming soon)

To surface a subset of Data Rows, teams can filter on:

  • Annotation
  • Function
  • Data Row
  • Dataset
  • Media Attribute
  • Specific label actions

Created at

Labeled by

Reviewed by

Re-worked by (coming soon)

Skipped

  • Metadata
  • Batch
  • Review task

Why is the Data Rows tab important?

The Data Rows tab works with the new Overview page to give teams a holistic picture of all their Data Rows and where each data row is in their labeling workflow.

Prior to the Data Rows tab, teams were using the Label tab to keep track of data row activity. While it provided a view of all Data Rows within a project, teams were limited in how they were able to quickly surface data rows of interest.

Over the Labels tab, the Data Rows tab supports more advanced filtering capabilities, so teams can find Data Rows and quickly understand the status of a data row. Teams now have a more cohesive way of filtering and surfacing specific data rows across Catalog and Model.

Designed to work with batches and multi-step review workflows, the Data Rows tab gives you a much better and holistic view of your labeling operations.

You can learn more about the Data Rows tab in our documentation.


A holistic view of your project's progress

The Data Rows tab will update in sync with your project’s overview page, allowing you to easily see how your Data Rows are progressing through your project’s workflow.


Drill into specific Data Rows at each step of your workflow

View all Data Rows within a specific stage of your workflow in the left panel of the Data Rows tab. Clicking into each status will bring up all the data rows within that stage of your workflow.

You can also view your Data Rows in “gallery view” – allowing you to view Data Rows with a thumbnail view. This view will display and render any bounding box annotations in the preview.


Filter and find specific slices of data

Teams can use dynamic filters to query and surface specific Data Rows of interest. Mirroring the search capabilities in Catalog, you can query for Data Rows within a project faster than ever.

With flexible querying, you can use a combination of AND/OR conditions on attributes for more granular searches. Filter on:

  • Annotation
  • Function
  • Data Row
  • Dataset
  • Media Attribute
  • Specific label actions

Created at

Labeled by

Reviewed by

Re-worked by (coming soon)

Skipped

  • Metadata
  • Batch
  • Review task

Manage batches and view batch history

Batches are a collection of Data Rows that are queued from Catalog and added to your labeling project. They are critical in enabling faster data-centric iterations and in helping unlock active learning workflows to improve label or model errors.

You can easily manage and view batches directly from the Data Rows tab:

How to add, view & manage batches for a Benchmark project

How to add, view & manage batches for a Consensus project

How to delete a batch

You can learn more about batches in this guide or in our documentation.


Conduct actions in bulk to improve efficiency

Complex projects might feature a high number of Data Rows. It’s important that teams are able to effectively manage data rows of interest to improve labeling efficiency.

You can conduct actions in bulk by selecting bulk Data Rows together and completing one of the desired actions below:

  • Move to a specific step of your workflow: Allows you to bulk move certain Data Rows to a review step or rework in your labeling workflow. For example, you can easily move bulk data rows to “Rework” or “Done” steps.
  • Delete & re-queue: Allows you to send data rows back to the labeling queue and delete current labels. You also have the option to preserve the existing label as a template.
  • Hide or unhide from labelers: Allows you to hide certain data rows from labelers (i.e sensitive or inappropriate content). This will restrict all labelers from being able to view selected data rows in the editor.

What value does the Data Rows tab unlock?

Designed to work with multi-step review workflows

For many Enterprise teams working on larger and more complex projects, a key question becomes how to structure, review, and complete training data projects in a systematic way.

Multi-step review workflows can give teams the flexibility to review Data Rows at a specific step of the review process. Rather than having to review or sort through all of your Data Rows, the Data Row tab gives teams a holistic look into all the review steps within a project’s workflow.

Flexible querying with dynamic filters:

The search capabilities in the Data Rows tab mirror Catalog – you can query to surface a specific subset of Data Rows within a project to better QA and understand all the data rows within your project.

Filters like Annotation type or sorting by Function allows teams to identify and QA a subset of Data Rows that meet a specific criteria. In addition to Workflows, the Data Rows tab unlocks the unique ability for ad-hoc review and flexible QA in addition to being able to view & manage your entire labeling operations.