LabelboxJune 2, 2022

How to use Hugging Face 🤗 models to jumpstart model training

From self-driving cars to machine learning models able to generate images based on natural language inputs, there's no shortage of innovation in the artificial intelligence (AI) space.

With rapid innovation typically comes close collaboration. If there's not yet an established standard for a certain type of technology, users will often try to find the answer by asking others.

This is visible in many forums such as Reddit and Github that have communities dedicated to machine learning and AI development. Hugging Face is also a community that came about to help users build better AI.

In this article, we’ll guide you through what Hugging Face is and how you can use it to efficiently improve models in development and production.

What is Hugging Face?

Hugging Face is a data science platform community that emerged to help users build, train, and deploy machine learning models based on open source code and technology.

As a user, you’re able to reference and use pre-trained models, datasets, and documentation uploaded by other users as well as upload and contribute your own projects.

A majority of Hugging Face’s community contributions fall under the category of natural language processing (NLP) models, but there are also models related to audio and computer vision.

How do you use Hugging Face Models?

Resources from a community like Hugging Face give you a starting point for your AI project. It takes a significant amount of investment to create the right dataset to train a machine learning model. This investment usually involves time, money, and human resources that might be beyond a company's means, whether they're on tight deadlines or lacking the in-house expertise to build a model from scratch.

While using a pre-trained model from Hugging Face may require some effort to integrate into your project, it's much easier than creating your own training data, setting up your workflows, and training the model from scratch. Integrating with a third-party model, such as Hugging Face or a model of your choice, allows you to pre-label data so that you can focus manual labeling effort and cost on edge cases or fixing pre-labels.

That said, even if you do have the data prepared for the model training process, it's often worthwhile using a pre-trained model to measure the accuracy of your own solution.

Hugging Face models provide many different configuration and support for a variety of use cases, but here are the six tasks that it's most commonly used for:

  • Language modeling
  • Translation
  • Question & answer
  • Sequence classification
  • Summarization
  • Named entity recognition

What are some examples of Hugging Face models?

Not all data impacts model performance equally. In fact, a huge roadblock many teams face is being able to leverage automation to speed up labeling to go through even faster data-centric iterations.

Using your model to pre-label is one of the most effective labeling automation methods. Labeling effort typically scales linearly with the number of annotations, and that the amount of labeled data required increases exponentially with each iteration of the model — presenting a clear incentive for ML teams to increase their labeling efficiency.

In the video example below, you’ll learn how to use a pre-trained model from Hugging Face to run model-assisted labeling and active learning on named-entity recognition (NER) data.

Labelbox Model helps you identify targeted improvements in your training data to boost model performance. With workflows that allow you to inspect model predictions and confidence scores, you can then send low-confidence data back as a batch to your project to further fine-tune model performance. Finally, you can even automate this process with a one-click model inference API integration.

Final thoughts on Hugging Face models

Using a pre-trained model means less training and less effort in building your model's architecture. As more innovative AI solutions appear, communities like Hugging Face will continue to grow and evolve.

Labelbox is a best-in-class AI data engine that offers data curation, AI-assisted labeling, model training & diagnostics, and labeling services, all in one platform. Download the complete guide to data engines to learn more.