AI glossary

Learn all terms related to AI development and machine learning.


Assets (or data assets) are individual files to be labeled, such as an image, a video, or a text file. Can be hosted in a cloud bucket, uploaded from a local file location, or copied from a remote data source.

Learn more

JavaScript is a high-level programming language commonly used for creating interactive and dynamic content on web pages, with applications ranging from front-end web development to server-side programming.

Learn more
Few-shot learning

A technique whereby we prompt an LLM with several concrete examples of task performance.

Learn more
Zero-shot learning

A technique whereby we prompt an LLM without any examples, attempting to take advantage of the reasoning patterns it has gleaned (i.e. a generalist LLM)

Learn more
Fine tuning

A technique whereby we take an off-the-shelf open-source or proprietary model, re-train it on a variety of concrete examples, and save the updated weights as a new model checkpoint

Learn more
Word embedding

A word embedding, trained on word co-occurrence in text corpora, represents each word (or common phrase) w as a d-dimensional word vector w~ 2 Rd. It serves as a dictionary of sorts for computer programs that would like to use word meaning. First, words with similar semantic meanings tend to have vectors that are close together. Second, the vector differences between words in embeddings have been shown to represent relationships between words.

Learn more

A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned.

Learn more

The variance is the mean square deviation of the variable around the average value. It reflects the dispersion of the empirical values around its mean.

Learn more

The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.

Learn more

The sum of all information derived from diagnostic, descriptive, predictive, and prescriptive analytics embedded in or available to or from a cognitive computing system.

Learn more

An embedding is a representation of a topological object, manifold, graph, field, etc. in a certain space in such a way that its connectivity or algebraic properties are preserved. For example, a field embedding preserves the algebraic structure of plus and times, an embedding of a topological space preserves open sets, and a graph embedding preserves connectivity. One space X is embedded in another space Y when the properties of Y restricted to X are the same as the properties of X.

Learn more
Model-assisted labeling

A process in machine learning and data annotation where human annotators label data with the assistance of pre-trained machine learning models. Instead of relying solely on manual annotation, which can be time-consuming and expensive, model-assisted labeling leverages the predictions of machine learning models to accelerate the annotation process, letting labeling teams focus time on refining, accepting, or rejecting model predictions rather than starting from scratch with each label.

Learn more

A system built with a neural network transformer type of AI model that works well in natural language processing tasks. In this case, the model: (1) can generate responses to questions (Generative); (2) was trained in advance on a large amount of the written material available on the web (Pre-trained); (3) and can process sentences differently than other types of models (Transformer).

Learn more
Foundation models

Foundation Models represent a large amount of data that can be used as a foundation for developing other models. For example, generative AI systems use large language foundation models. They can be a way to speed up the development of new systems, but there is controversy about using foundation models since depending on where their data comes from, there are different issues of trustworthiness and bias.

Learn more
AGI (Artificial General intelligence)

Algorithms that perform a wide variety of tasks and switch simultaneously from one activity to another in the manner that humans do.

Learn more
AI (Artificial Intelligence)

AI is a branch of computer science. AI systems use hardware, algorithms, and data to create “intelligence” to do things like make decisions, discover patterns, and perform some sort of action. AI is a general term and there are more specific terms used in the field of AI. AI systems can be built in different ways, two of the primary ways are: (1) through the use of rules provided by a human (rule-based systems); or (2) with machine learning algorithms.

Learn more
Labelbox Workspace

Enables admins at large organizations to manage multiple instances of Labelbox with the same subscription account.

Learn more
Labelbox Workflow

A workflow is a queue for labeling and reviewing assets within a project. Workflows provide granular control over data row reviews. Workflows are highly customizable and help define a step-by-step pipeline leading to an efficient and more accurate process.

Learn more
Unsupervised learning

Algorithms, which take a set of data consisting only of inputs and then they attempt to cluster the data objects based on the similarities or dissimilarities in them.

Learn more
Unstructured data

Unstructured data is defined as information that is not arranged according to a preset data model or schema, and therefore cannot be stored in a traditional database.

Learn more

Underfitting occurs when a statistical model cannot adequately capture the underlying structure of the data.

Learn more

A procedure that modifies a dataset.

Learn more
Transfer learning

A technique in machine learning in which an algorithm learns to perform one task, such as recognizing cars, and builds on that knowledge when learning a different but related task, such as recognizing cats.

Learn more
Training data

A dataset from which a model is learned.

Learn more
Labelbox Template

If a data row needs to be relabeled, you can delete the annotations and then select existing annotations to use as a template for the next data row displayed in the editor. This allows you to curate a set of annotations, rather than start from scratch for each data row.

Learn more

Taxonomy refers to classification according to presumed natural relationships among types and their subtypes.

Learn more
Supervised learning

A type of machine learning in which the algorithm compares its outputs with the correct outputs during training. In unsupervised learning, the algorithm merely looks for patterns in a set of data.

Learn more
Labelbox Schema

The schema is the master blueprint for your training data and includes ontologies, features, and metadata.

Learn more
Robotic process automation

A preconfigured software instance that uses business rules and predefined activity choreography to complete the autonomous execution of a combination of processes, activities, transactions, and tasks in one or more unrelated software systems to deliver a result or service with human exception management.

Learn more
RLHF (Reinforcement learning with human feedback)

RLHF is an extension of Reinforcement Learning (RL), a reward and punishment-based training technique for AI models. It involves training a model through iterative interactions where humans provide guidance or evaluations to improve the model's decision-making process.

Learn more
Reinforcement learning

A type of machine learning in which the algorithm learns by acting toward an abstract goal, such as “earn a high video game score” or “manage a factory efficiently.” During training, each effort is evaluated based on its contribution toward the goal.

Learn more
Labelbox Queue

Labelbox has three queues to help move data rows through the labeling and review workflow: the batches queue, the labeling queue, and the review tasks queue.

Learn more
Labelbox Project

The labeling environment in Labelbox, like a factory assembly line for producing labels. The initial state of the project can start with raw data, pre-existing ground truth, or pre-labeled data.

Learn more
Preprocessing algorithm

A bias mitigation algorithm that is applied to training data.

Learn more

Output from your machine learning model that you can add to a data row to serve as a template for faster labeling.

Learn more

A metric for classification models. Precision identifies the frequency with which a model was correct when classifying the positive class.

Learn more

In statistics and machine learning, overfitting occurs when a model tries to predict a trend in data that is too noisy. Overfitting is the result of an overly complex model with too many parameters. A model that is overfitted is inaccurate because the trend does not reflect the reality of the data. An overfitted model is a model with a trend line that reflects the errors in the data that it is trained with, instead of accurately predicting unseen data.

Learn more

A collection of features and their relationships (also known as a taxonomy). Ontologies can be reused across different projects. Ontologies are essential for data labeling, model training, and evaluation. When you label or review a data asset, the ontology appears in the Tools panel.

Learn more
Neural network

A highly abstracted and simplified model of the human brain used in machine learning. A set of units receives pieces of an input (pixels in a photo, say), performs simple computations on them, and passes them on to the next layer of units. The final layer represents the answer.

Learn more
Nested classification

A classification-type annotation that is nested within an object-type annotation (as opposed to a global classification).

Learn more
NLP (Natural language processing)

A computer's attempt to “understand” spoken or written language. It must parse vocabulary, grammar, and intent, and allow for variation in language use. The process often involves machine learning.

Learn more
Model run

A model run is a model training experiment within a model directory. Each model run has its data snapshot (data rows, annotations, and data splits) versioned. You can upload predictions to a model run, and compare results and performance against other model runs in the model directory.

Learn more
Labelbox Model

A Model is a directory where you can create, manage, and compare a set of Model Runs related to the same machine learning task. Each Model is specified by an ontology of data: it defines the machine learning task of the Model Runs inside the directory.

Learn more

Machine learning algorithms and data processing designed, developed, trained and implemented to achieve set outputs, inclusive of datasets used for said purposes unless otherwise stated.

Learn more

Data employed to annotate other data with descriptive information, possibly including their data descriptions, data about data ownership, access paths, access rights, and data volatility.

Learn more
Labelbox Metadata

Metadata is non-annotation information about the asset to be labeled. There are two types of metadata: reserved keys (which cannot be changed) and custom (user-defined). Metadata helps search and filter data rows.

Learn more
Media attributes

When you upload data assets, Labelbox automatically computes media attributes appropriate for the data type and stores their values as part of the data row. Examples include mimeType, width, height, codec, and more.

Learn more
MLOps (Machine learning operations)

MLOps (machine learning operations) stands for the collection of techniques and tools for the deployment of ML models in production.

Learn more
Machine learning

The study or the application of computer algorithms that improve automatically through experience. Machine learning algorithms build a model based on training data in order to perform a specific task, like aiding in prediction or decision-making processes, without necessarily being explicitly programmed to do so

Learn more
LLM (Large Language Models)

A class of language models that use deep-learning algorithms and are trained on extremely large textual datasets that can be multiple terabytes in size. LLMs can be classed into two types: generative or discriminatory.

Generative LLMs are models that output text, such as the answer to a question or even writing an essay on a specific topic. They are typically unsupervised or semi-supervised learning models that predict what the response is for a given task. Discriminatory LLMs are supervised learning models that usually focus on classifying text, such as determining whether a text was made by a human or AI.

Learn more

The stage of machine learning in which a model is applied to a task. For example, a classifier model produces the classification of a test sample.

Learn more
Image segmentation

Image segmentation is the process of separating an image into multiple different parts to simplify its representation and facilitate analysis for training a computer vision model. Image segmentation is one of the most labor intensive annotation tasks because it requires pixel level accuracy. Labeling a single image can take up to 30 minutes. With image segmentation, each annotated pixel in an image belongs to a single class. The output is a mask that outlines the shape of the object in the image.

Learn more

The parameters that are used to either configure a machine learning model (e.g., the penalty parameter C in a support vector machine, and the learning rate to train a neural network) or to specify the algorithm used to minimize the loss function (e.g., the activation function and optimizer types in a neural network, and the kernel type in a support vector machine).

Learn more
Ground truth

A ground truth is information that is known to be real or true, as supported by direct observation and measurement. Labels made by humans are considered to be empirical ground truths, as opposed to labels added through model inference.

Learn more
GPU (graphical processing unit)

A specialized chip capable of highly parallel processing. GPUs are well-suited for running machine learning and deep learning algorithms. GPUs were first developed for efficient parallel processing of arrays of values used in computer graphics. Modern-day GPUs are designed to be optimized for machine learning.

Learn more
GAN (Generative Adversarial Network)

Generative Adversarial Networks, or GANs for short, are an approach to generative modeling using deep learning methods, such as convolutional neural networks. Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

Learn more
Feature extraction

A more general method in which one tries to develop a transformation of the input space onto the low dimensional subspace that preserves most of the relevant information.

Learn more

A feature is the master definition of what you want the model to predict. It is also the blueprint for your ground truth. Ontologies consist of features, which include objects (example: bounding box) and classifications (radio buttons). Features can have multiple, nested classifications.

Learn more
Labelbox Editor

The labeling interface you can use to create, review, and edit annotations. When creating a new project, you're prompted to configure the editor, which defines the data type and the interface used while labeling.

Learn more
Deep learning

An approach to AI that allows computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined through its relation to simpler concepts.

By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all the knowledge that the computer needs. The hierarchy of concepts enables the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers.

Learn more

Datasets are containers for data rows; they collect a set of related data assets.

Learn more
Data type

Type of data row such as image (JPG/PNG), Video (MP4), text (.txt files).

Learn more
Data split

You can split the selected data rows into train, validation, and test splits to prepare for model training and evaluation.

Learn more
Data row

Represents an individual data asset, along with associated attributes (such as global ID) and annotations, which can include:

  1. URL to your cloud-hosted file

  2. Metadata

  3. Media attributes (e.g., data type, size, etc.)

  4. Attachments (files that provide context for your labelers)

  5. Predictions

Learn more

The Consensus tool lets you compare labelers against each other by comparing annotations on a given asset. Consensus works in real-time so you can take immediate and corrective actions toward boosting team and model performance.

Learn more
Computer vision

Computer vision is a field of Artificial Intelligence (AI) technology that enables computer systems to perform tasks that require visual perception.

Learn more
Confusion matrix

A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is the number of different label values.

Learn more

A chatbot is a computer program which responds like an intelligent entity when conversed with. The conversation may be through text or voice. Any chatbot program understands one or more human languages by Natural Language Processing.

Learn more

An organization-wide platform for curating and exploring your unstructured data. Catalog enables you to easily browse, curate, and develop insights across all labeled and unlabeled data rows in your organization.

Learn more

Boost is a service that helps enterprise customers scale machine learning (ML) operations up. Boost includes a variety of professional services and software assistance, including a labeling workforce.

Learn more
BERT (Bidirectional Encoder Representation from Transformers)

A language representation model designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

Learn more

A method for selecting data rows from Catalog and sending them to a labeling project. Sending batches of data rows to a labeling project is an alternative to attaching an entire dataset to a project.

Learn more
Labelbox Benchmark

The Benchmark tool lets you designate a labeled asset as a “gold standard” and automatically compare all other labels on that asset to the benchmark label.

Learn more

Supplementary information you can attach to an asset in order to provide contextual information for your labeling team. When viewing data rows in detail view, attachments appear on a separate side panel.

Learn more

A human-made or computer-generated label on an asset. Annotations can be imported (as ground truth or pre-labels) or they can be created manually in the Labelbox editor. Annotations are categorized as objects (such as bounding box or polygon) or classifications (e.g. radio, checklist, etc).

Learn more
Active learning

A proposed method for modifying machine learning algorithms by allowing them to specify test regions to improve their accuracy. At any point, the algorithm can choose a new point x, observe the output and incorporate the new (x, y) pair into its training base. It has been applied to neural networks, prediction functions, and clustering functions.

Learn more