LabelboxDecember 20, 2022

Fine-tune language models to your use case with right data

All Labelbox users can now natively upload and annotate documents and conversational text to train language models for their specific business use case. Read on to learn more about the latest improvements to the video editor and updates to the new way to queue and review your data rows.

Natively upload and annotate documents for NER and OCR

All Labelbox users can now natively upload and annotate their PDF documents for NER and OCR use cases.

Documents such as PDFs are widespread across a variety of industries such as financial services, real-estate, healthcare, and many more. PDFs contain valuable information that often need to be extracted for greater understanding or a specific business use case.

However, not only are PDFs inherently complex – as they often contain text, images, graphs, and more – but the format of PDFs can vary. If I am looking to train a model to extract key information from product brochures, such as the product name and product specifications, I’d be faced with the difficult task of capturing both text and visual information across a variety of product brochure styles. Interpreting and extracting information from native PDFs of various formats and structures is a key challenge in any AI use case based on PDF data.

The document editor is a multimodal annotation platform, allowing you to easily turn stores of PDF files and documents into performant ML models. You can annotate text with an NER text layer alongside traditional OCR to extract both text and images without losing context.

Leverage our document editor to:

Upload and annotate whole conversations

All Labelbox users can now natively upload and annotate text formatted as conversations.

The rise in natural language processing (NLP) language models have given machine learning teams the opportunity to build custom tailored experiences for their business use cases. Use cases can range from improving customer support metrics, creating delightful customer experiences, or preserving brand identity and loyalty.

As we’ve seen with the virality and success of OpenAI’s ChatGPT, we’ll likely continue to see AI powered language experiences penetrate all major industries. One of our customers is using our conversational text editor to annotate conversations to better understand the intent of their current users. Rather than spending time and cost on manual conversation review, they set out to build a model to identify the intent and types of questions that are frequently asked by their customers.

With our conversational text editor, you can:

Learn how to train a chatbot or how to annotate conversational text for a chatbot use case in our latest guides, or read more about our conversational text editor in our documentation.

Capture three dimensional space in images

All Labelbox users can use the cuboid tool to capture three dimensional space in 2D images.

For certain use cases, capturing and understanding an object or person’s dimensions is important. This is especially true for instances where the angle, size, and depth of a person or object in an image can represent different things.

For example, if you’re interested in capturing the head tilt and angle of a person’s head in an image, you can use the cuboid tool to capture the up and down, side to side, and left to right dimensions. Rather than simply annotating the person’s head with a bounding box, the cuboid tool allows the model to be trained on the exact rotation and directionality of the person’s head.

To create a cuboid annotation, simply create an image project and select “cuboid” from the tool dropdown during ontology creation. Create a cuboid by drawing a bounding box over the object in the image.

Once released, it will automatically become a cuboid — allowing you to use the various levers on the tool to adjust the cuboid's rotation along the x, y, and z axes. At the top of the editor, you’ll find corresponding buttons to switch to Rotate mode (x axis), Move mode (y axis), or Scale mode (z axis).

To learn more about the cuboid tool, feel free to refer to our documentation.

Video editor improvements: Click-and-drag classification and bounding box tracking

Easily adjust frame-based classification segments

The video editor currently supports both radio and checklist classifications at the frame and global level.

A highly requested feature, this update gives you greater flexibility in adjusting the keyframes within a classification segment. You can simply click and drag the keyframe to adjust it to the desired classification length.

If you’re a current Labelbox user, you can try out this new way to classify videos in our video editor today.

Annotate videos faster than ever with bounding box tracking (beta)

Over other data types, video labeling can be a complex task. There are often many objects of varying sizes that need to be labeled across frames. Even a short one minute video can often take labelers tens of minutes to annotate.

We’re introducing bounding box tracking for video. A user can now simply draw a bounding box around the object of interest and the object will be tracked across frames.

To sign up for the beta, please fill out this form.

Moving to a new way to queue and review

All Labelbox customers will be moving to a new way to queue and review before the end of January.

We’ve continued to roll out batch-based queueing, custom review workflows, and the Data Rows tab across our user base.

What is the new way to queue & review?

All new projects will automatically be configured with batch-based queueing, the Workflow tab, and the Data Rows tab.

Decide what data to label in priority with batch-based queueing

  • A batch is a collection of data rows from Catalog that can be queued to your labeling project
  • Batch-based queueing will replace dataset-based queueing, which means you’ll no longer be able to attach datasets to new projects
  • Instead, after uploading your dataset to Catalog, you’ll need to queue data rows through a batch (you can add up to 100k data rows to a project in a single batch and add an unlimited number of batches to a project)

Learn more about the power of batches: How to prepare and submit a batch for labeling

Customize your review process with the Workflow tab

  • Workflows allows you to easily understand the state of your data within your labeling operations pipeline
  • Optimize how labeled data gets reviewed across multiple tasks and reviewers
  • Move batches between stages of your workflow
  • Easily review your data & automatically send incorrect labels to be fixed
  • Better understand your project’s data lineage to iterate & provide feedback

Learn more about workflows: How to customize your annotation review process

Better search, surface, and prioritize data within a project

  • The Data Rows tab is the central hub for all Data Rows within a given project. It allows you to view, manage, and filter for Data Rows within your project.
  • Understand what data rows belong to each stage of your workflow
  • Filter and find specific slices of data
  • Manage batches and view batch history
  • Conduct actions in bulk to improve efficiency

Learn more about the Data Rows tab: How to search, surface, and prioritize data within a project

On November 21st, we released the above update to Free, Education, and Starter users. Starting December 15th, we have begun rolling out this update to our Pro and Enterprise customers.

In Q1 2023, Labelbox will open a migration path that will allow you to move all of your old projects into this new paradigm.

We’ve compiled a list of resources to help you better familiarize yourself with this new way to queue & review:

Guide: How to train a chatbot

You can harness the potential of the most powerful language models, such as ChatGPT, BERT, etc., and tailor them to your unique business application. Domain-specific chatbots will need to be trained on quality annotated data that relates to your specific use case.

Check out the guide