Large volumes of high-quality training data are crucial to the success of any machine learning model. A labeling project is where you orchestrate and manage all of your labeling operations within Labelbox.
The first step in the labeling process is to align on the key components of the labeling task within a project. This sets the tone of the project and allows Labelbox to make labeling more efficient down the line.
Creating a project and configuring your labeling task on the Labelbox platform begins alignment on which type of data needs to be labeled.
Labelbox supports the following data types:
Based on your chosen data modality, the Labeling Team Manager can leverage prior knowledge and the existing experience of different teams of labelers to work on your projects.
For instance, some workforce teams have an excellent track record of successfully labeling projects with a specific type of imagery, while others have extended experience in working with the video editor or in specialized text projects. By curating labeling teams who are already well-versed in your use case and match your project needs, they can get started quicker with little friction.
Similarly, our labeling partners are experienced in various industries such as:
Leveraging labelers who have experience in the industry of interest for your task should be discussed, as it allows the Labeling Team Manager to allocate the best workforce who meets your needs.
Some teams have large experience in aerial roof tagging for insurance companies, while others have been working on long-term microscopy pictures in the medical field, and many more variations of all kinds of tasks. Understanding the scope and frame of the task allows Labelbox to set your team up for success with the right workforce.
A workforce team who is well-versed in your industry or use case will need less time to get calibrated on your task. This means they'll be able to label more data in less time, which results in cheaper labeling costs as you only pay for the time when labeling screen-time occurs.
There might be some use cases where general experience in a specific industry or data type might not be sufficient enough to meet your requirements. Labelbox also offers the option for you to onboard expert labelers for your project needs. You can learn more below under the "Specific labeling requirements" section.
Labelbox's Annotate is designed to give you complete visibility and control over every aspect of your labeling operations across data modalities.
While setting up your labeling project, you'll need to acknowledge the supported file formats and annotation types in order to prevent issues down the line.
Similarly, it is important to understand how to use Annotate to set up your project and labeling task, collaborate with your internal or external teams, and how to ensure that you're minimizing labeling time and spend.
For instance, only one labeler can work per data row. If you have long videos to annotate, we might recommend splitting them into multiple files so more annotators can work on the data. Ultimately this will depend on your own speed and time requirements, however the Labeling Team Manager is available to work with you to determine what will work best for your team's use case.
You can learn more about Annotate in our documentation.
Labelbox's Catalog is a data curation tool for you to organize, search, visualize, and explore your unstructured data.
Utilizing Catalog for data selection is a huge advantage in having a quality batch of data to label, according to specific parameters required for your task. You can leverage Catalog's features, such as filters, a one-click similarity search, metadata, and more, to ensure that the data you're queueing to your project is well-structured for your business requirements.
You can learn more about Catalog in our documentation.
For more specific and specialized tasks, Labelbox has the ability to onboard labelers who are qualified in particular domains:
Your Labeling Team Manager can source a specialist, based on the expertise needed, who can get started on your task. Since experts can take longer to source, it is essential to determine and request this requirement along with additional information needed prior to the start of labeling.
As Labelbox partners are spread out across different countries, it's important that geographical location is acknowledged and discussed so your business requirements are met. Depending on your discussed compliance and project needs, the Labeling Team Manager will ensure that the right workforce is onboarded accordingly.
Labelbox partners are compliant with the following certifications:
Defining your data volume is a key element to consider when outlining your labeling task. This allows early expectations to be set in context of throughput and subsequently helps the Labeling Team Manager and the workforce to organize the labeling in the most efficient manner.
For instance, the following aspects should be considered and discussed when defining your task:
Based on the above, the Labeling Team Manager can allocate the task to the team most appropriate to meet the volume demands in terms of resources and availability.
Understanding the timeline of your task is also crucial in effectively ramping up and scaling labeling activity. A rough outline of when you want the project to start and the target completion data helps define a structured labeling process and makes resource management easier.
A task that is set up for success will consider the following:
Quick turnarounds on high volumes of data and tight deadlines can be delicate to navigate, so planning ahead and understanding timelines in advance can help maximize the resources available and the workforce's time can be used efficiently.
Another important aspect of a well-defined labeling task is the ontology. It needs to be built in a way that will work for the task at hand, and that follows the most logical workflow for a labeler. Ontologies and features should be created and managed with the goals of proper labeling, efficiency, and reusability in mind.
Within an ontology, the three kinds of features are:
A good ontology should define and answer the following:
Reusing ontologies can be useful if you're planning on having multiple projects for the same or a very similar use case. Elements can be added to an existing ontology without affecting labels, so an ontology is not set in stone and you're encouraged to test and fine-tune your ontology.
You can learn more about how to create and manage your ontologies in this guide.
You should choose tools that will allow labelers to label as fast as possible while maintaining the output needed for your model.
Sample questions to consider when selecting tools would be:
High-quality training data is critical to the success of your model. Ensure quality by designing a clear and efficient ontology that makes it easier for you to organize the output.
Sample questions to consider include:
Once all the modalities of the task have been properly defined, you have to provide labeling instructions to the workforce. Even with an extremely simple ontology, it is necessary to offer additional information related to the labeling task.
Labeling instructions compliment the ontology and can be in the form of a document or a video demo. You can include anything that you deem useful and relevant to explaining the rules of your labeling task in a way that is easy for labelers to follow.
Good instructions will go into detail with specific examples and clearly lay out major labeling rules. Labeling instructions should provide context to the task, explain what the task entails, describe the labeling steps, and serve as a "living document".
Instructions can be altered depending on task progress and any changes in your requirements. Since changes can be made, it is advised for you to keep track of what made a label meet your success criteria so you can tailor the instructions to help the team understand important criteria of what makes a "good label".
It is important that you make sure to list and define the items that you want labeled. As mentioned in the "Creating an ontology" section above, you should be clear on the features and rules behind each expected annotation:
For example, if the project contains several entities/objects to be labeled with multiple classifications to choose from, explain each entity/object and each classification in sufficient detail:
When defining your labeling rules, you should aim to:
Make sure to describe each step in the labeling workflow so that your labelers are not lost in the ontology:
The best way to convey the results you want is by providing clear examples of the data to the labelers in the form of screenshots in your instructions. There are several approaches to this:
All of these key components contribute to defining a task that is set for success on the Labelbox platform. These aspects also help the Labeling Team Manager ensure that your project outcomes are successful.
After the initial task setup, the next step in your labeling journey is to define your success criteria. Along with your volumes and deadlines, learn more about how to get a notion of the average time per label, describe how a "good label" is measured, and learn about SLAs in our next guide: How to define your data labeling project's success criteria.