Embeddings transform complex data into meaningful vector representations, enabling powerful applications across various domains.
This post covers the theoretical foundations, practical examples, and demonstrates how embeddings enhance key use cases at Labelbox.
Whether you’re new to the concept or looking to deepen your understanding, this guide will provide valuable insights into the world of embeddings and their practical uses.
At its core, an embedding is a vector representation of information. This information can be of any modality—video, audio, text, and more.
The process of generating embeddings involves a deep learning model trained on specific data.
Each index in a vector represents a numerical value, often a floating-point value between 0 and 1.
The dimensions of a vector describe the level of detail it can capture. Higher dimensionality means higher detail and accuracy in the similarity score.
For example, a vector with only two dimensions is unlikely to store all relevant information, leading to simpler but less accurate representations.
For instance, consider an image. The underlying compression algorithm converts this image into a vector representation, preserving all relevant information. These embeddings can then be used to determine the similarity between different pieces of data. For example, images of a cat and a dog would have different embeddings, resulting in a low similarity score.
Another example: Let's consider a model trained on images of helicopters, planes, and humans. If we input an image of a helicopter, the model generates a vector representation reflecting this. If the image contains both a helicopter and humans, the representation changes accordingly. This process helps in understanding how embeddings are generated and used.
As we’ve described in the prior sections, understanding what embeddings are and how they work is crucial for several reasons:
Embeddings are extremely useful in search problems, especially within recommender systems.
Here are a few examples:
Deep learning models, trained on either generalized or specific datasets, learn patterns from data to generate embeddings. Models trained on specific datasets, like cat images, are good at identifying different breeds of cats but not dogs. Generalized models, trained on diverse data, can identify various entities.
Modern models produce embeddings of over 1024 dimensions, increasing accuracy and detail. However, this also raises challenges related to the cost of embedding generation, storage, and computational resources.
Some of the earliest work with creating embedding models included word2vec for NLP and convolutional neural networks for computer vision.
The introduction of word2vec by Mikolov et al. in 2013 marked a significant milestone in creating word embeddings. Word2vec enabled the capture of semantic relationships between words, such as the famous king-queen and man-woman analogy, demonstrating how vectors can represent complex relationships.
This breakthrough revolutionized natural language processing (NLP), enhancing tasks like machine translation, sentiment analysis, and information retrieval.
Word2vec laid the foundation for subsequent embedding techniques and deep learning models in NLP.
For images, convolutional neural networks (CNNs) serve as the equivalent of word2vec. Models like AlexNet, VGG, and ResNet revolutionized image processing by creating effective image embeddings.
These models convert images into high-dimensional vectors, preserving spatial hierarchies and semantic information.
Just as word2vec transformed text data into meaningful vectors, CNNs transform images into embeddings that capture essential features, enabling tasks like object detection, image classification, and visual similarity search.
Labelbox now supports importing and exporting custom embeddings seamlessly, allowing for better integration into workflows. This new feature provides flexibility in how embeddings are utilized, making it easier to leverage embeddings for various applications.
To compute your own embeddings and send them to Labelbox, you can use models from Hugging Face. Here’s a brief example using ResNet-50:
This code snippet demonstrates how to create and upload custom embeddings, which can then be used for similarity searches within Labelbox.
Custom embeddings provide several advantages:
Earlier we mentioned that search (a core activity of search engines, recommendation systems, data curation, etc) is a common and important application of embeddings. How? By comparing how similar two (or many more) items are to each other.
The most popular algorithm for measuring similarity between two vectors is cosine similarity. It calculates the angle between two vectors: the smaller the angle, the more similar they are. Cosine similarity scores range from 0 to 1, or -1 to 1 for dissimilarity.
For example, if vectors W and V are close, their cosine similarity might be 0.79. If they are opposite, the similarity is lower, indicating dissimilarity. This method helps in visually and computationally understanding the similarity between different assets.
Because embeddings are essentially vectors, we can apply many of the same operations used to analyze vectors to embeddings.
Additional strategies for performing similarity computation include:
Tree-based methods (KD-trees, R-trees, and VP-trees) and graph-based methods (like Hierarchical Navigable Small World, also known as HNSW) are also options.
Through search, embeddings improve the efficiency and accuracy of data curation, management, and analysis in several ways:
And these are exactly the ways we use embeddings in our products at Labelbox to improve search for data.
For example, in Catalog, users can find images similar to an input image, such as searching for basketball players on a court. Our natural language search allows users to input a textual query, like "dog," and find relevant images.
Labelbox offers several features to streamline embedding workflows, including:
These features ensure that embeddings are utilized effectively, making your workflows more efficient and your models more accurate.
Embeddings are a powerful tool in machine learning, enabling efficient and accurate similarity searches and recommendations. We hope this post has clarified what embeddings are and how they can be utilized.
For more on uploading custom embeddings, check out our documentation and developer guides.
Additional resources on embeddings can be found here:
Thanks to our contributors Tomislav Peharda, Paul Tancre, and Mikiko Bazeley!