Introducing Recursion: the RL platform for enterprise specialist agents

How to natively annotate a PDF document

PDF documents are inherently complex – they often contain lots of text, images, charts, graphs, and more. Information within PDFs can be interpreted in many different ways and traditional OCR solutions are not sufficient in capturing both text and visual information, which is vital for document image understanding and can limit the accuracy of your model.

Our Document editor is a multimodal annotation platform. You can easily turn stores of PDF files and documents into performant ML models. With the ability to use an NER text layer, you can easily annotate text of interest alongside OCR, without losing context.

With our Document editor, teams can:

Natively upload whole PDF files
Easily navigate pages and zoom in & out
Create and use a custom text layer
Save and export raw text
Create entities (for NER) - including tokenization at the word-level & character-level
Create bounding boxes (for OCR)
Use model-assisted labeling to import bounding boxes
Create annotation relationships - including between annotations that span different pages
Classify your PDF - with radio, checklist, and free-text classification
Use hotkeys to speed up your workflow

To learn more about our Document editor, please refer to our documentation.

Continue reading

Labelbox Leaderboards: Redefining AI evaluations with human-centric assessments

Introducing our groundbreaking Labelbox Leaderboards: an innovative, scientific process to rank multimodal AI models that goes beyond conventional benchmarks.

Programmatically launch human data jobs for RLHF and evaluation

Learn how to harness the SDK to manage human data labeling jobs for RLHF and model evaluation. With just a few steps, you can set up the SDK, import various types of data, and launch, monitor, and export labeling projects programmatically, all while ensuring data quality and scalability.

Evaluating leading text-to-speech models

Discover how to employ a more comprehensive approach to evaluating leading text-to-speech models using both human preference ratings and automated evaluation techniques.

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free