LabelboxMay 15, 2023

How to write more effective prompts for natural language search

When refining a model to address areas of weakness, surfacing high-impact data for labeling is critical to saving time and costs. Yet, finding the right assets amidst oceans of unstructured data presents one of the most difficult challenges in the machine learning development lifecycle.

Labelbox’s natural language search can give you an edge during this tricky process, allowing you to unearth relevant data instantly and accurately from within your entire Catalog. To maximize your efficiency with natural language search, however, you’ll need to create effective prompts.

With the emergence of large language models over the past year, prompt engineering has evolved into a valuable – and often overlooked – skill. In this blog post, we’ll explore best practices for crafting successful prompts specifically for natural language searches.

Use natural language

The first tip for prompt writing lives right in the name of the search style you’ll be performing: use natural language! Keep your prompts simple and to the point rather than opting for complex terminology or technical jargon.

Be specific

While simplicity will steer you toward greater accuracy, don’t misinterpret this advice as a suggestion to be general. Specificity is critical, especially when the goal is to surface hard-to-find data associated with rare edge cases. Including keywords unique to the content or context you’re after will yield more potent results.

Incorporate visual cues

When searching for imagery, incorporate any relevant visual cues. Think along the lines of colors, shapes, patterns, and even a location in the image. As emphasized in the tip above, the more detail the better, and this does not exclude visuals.

Add more context

Don’t forget about context. The time of day, background details, and generic indicators of location are all constructive elements of prompt engineering.


Iterate, iterate, iterate! A foundational element of prompt engineering is continuously testing and refining your prompts until finding one that works well. Oftentimes, your prompt may be just one keyword or sequential adjustment away from surfacing exactly the data you’re looking to gather.

For example, if the prompt, “man in a field at dusk,” doesn’t yield what you’re after, try “human in a field at dusk.” Then, if that’s closer but not quite right, try “human in a field in the evening,” and then maybe “human in the evening in a field.” The data rows you wish to label, and thus the improvements to your model, may be just one search iteration away.

Incorporating these tips in your prompt engineering will not only lead to more accurate results but also a faster performance from the search engine. Put simply, rendering hundreds of images requires a fair amount of memory. If your queries are accurate and specific, Labelbox can source fewer assets, and you can keep your data discovery pipeline rolling.

Using natural language search in Labelbox

Labelbox offers a straightforward, efficient structure for creating multi-faceted prompts for natural language search. You can refine your prompt by adding positive biases and negative biases using this structure: [my prompt] / [more of this positive bias] / [less of this negative bias].

Additionally, you can set the score range to further specify the similarity of the data rows surfaced by a given prompt. The similarity is measured using the cosine distance, a number between 0 and 1, of the embeddings of the prompt and each asset. The more similar the embeddings, the higher the score.

You can adjust the similarity requirements using the provided slider. Additionally, you can customize the biases in your prompt by supplementing each statement with a weight from 0 to 1. For example, a prompt may appear as follows: [my prompt] / [more of this positive bias] / [0.7] / [less of this negative bias] / [0.2].

Natural language search in Labelbox can also be combined with all other search filters in Catalog and seamlessly integrates into the entire suite of product offerings. You can use the resulting data rows as anchors to perform a similarity search, or perhaps you wish to select only data rows with certain structured data, such as by metadata, annotations, dataset, or projects.

You may wish to save your search as a slice so you can share the results of your perfect prompt with your teammates. As you add more data to Catalog, the slice populates dynamically with any data rows that match the filter criteria, thus saving you from having to perform the same searches every time you upload more unstructured data.

Once you’re confident in your prompt engineering, it may be time to perform zero-shot learning with bulk classification, which allows you to send the data rows to a project already annotated with your desired classification values and saves your labeling team time every step of the way.