Table of Contents

Named entity recognition

Alex Cota Updated by Alex Cota

Named entity recognition (NER) is an information extraction technique for classifying words or phrases from unstructured text as pre-categorized entities. NER is often used for search algorithms, recommendation systems, and applications that require automatic categorization of text. Many real-world applications even require the use of computer vision and NER to work in conjunction.

With the new NER labeling tool, you can now import text data, label it in the Labelbox Editor, and easily export your text labels.

The early version of this tool comes with the following caveats:

  • Nested classifications are not supported.
  • Cannot import text via URL.
  • Benchmark & consensus are not yet supported.
  • The objects panel is hidden when a labeler opens the text asset in the labeling interface.
  • The submit button is always active, it is not disabled until the labeler makes a change in the labeling interface.

How to access the NER tool

The NER tool is currently nested within our existing Editor. You can access the tool by creating a project, importing your data, and choosing “Editor” as the labeling interface. If you have access to NER, you will see an "Entity" tool when you are in the "Configure editor" step. Follow the steps below for importing your text data and choosing the “Entity” tool to configure your text labeling project.

Import text data

Once you have access to the NER labeling tool, you will be able to import non-image datasets.

Step 1: Format your import file

Each data row should contain only a data field which specifies the text string to label.

See an example here: text-ner.json

[ 
{
"data": "Lucy has a set of medical conditions that are summarized as HERNS.\nIn 1996, Lucy experienced a minor stroke, which caused temporary paralysis in her left arm.\n3 years ago, Lucy was diagnosed as lupus carrier. Since the diagnosis, Lucy has been taking Warfarin and she expects to maintain Warfarin therapy for life."
},
{
"data": "Lucy has a set of medical conditions that are summarized as HERNS.\nIn 1996, Lucy experienced a minor stroke, which caused temporary paralysis in her left arm.\n3 years ago, Lucy was diagnosed as lupus carrier. Since the diagnosis, Lucy has been taking Warfarin and she expects to maintain Warfarin therapy for life."
}
]

Step 2: Import your data

You have three options for importing your data:

a. Direct import your JSON

To import your JSON file directly to Labelbox, drop your file here.

b. Import your data via the Python API

First, create a project and a dataset. Then, use the create_data_rows method to create each data row.

dataset = client.get_dataset("<dataset_uid>")
task = dataset.create_data_rows([{DataRow.row_data:"Lorem..."}])

c. Pass your JSON via the GraphQL API To pass a URL to your cloud-hosted JSON file, first create an empty dataset.

mutation CreateDataset { 
createDataset(
data:{
name: "<DATASET_NAME>"
}
) {
id
}
}

Then, append the text data to your dataset.

mutation AppendRowsToDataset { 
appendRowsToDataset(
data:{
datasetId:"<DATASET-ID>",
jsonFileUrl:"<JSON-URL>"
}
){
accepted
}
}

Configure the Editor

After you have imported your text data, follow these steps:

  1. Select “Editor” as the labeling interface.
  2. In “Configure editor”, select “Entity” as the object type — like Objects and Classifications, Entities are a kind of Feature in Labelbox, except they are specific to labeling text data. Note: nested classifications are not yet supported.
  1. Complete the setup of the project.

Label the text data

  1. Select the tool from the left sidebar.
  2. Highlight the text to assign an Entity (must be in this order).
  3. To delete, click on the entity and select the “Delete” menu item.
  4. Click skip or submit to go to the next task.

Export labels

To export from the web app, see our docs on exporting labels. To export via the Python API, see Export labels. To export your labels via the GraphQL API, see our docs on Bulk export.

Label export format
[ 
{
"ID": "ck8kums7er4xk075485dkygdy",
"DataRow ID": "ck8kuhgip000g09mv6z7h40qv",
"Labeled Data": "Lucy has a set of medical...",
"Label": {
"objects": [
{
"featureId": "ck8kulppv000x0yf8pqpqqin4",
"schemaId": "ck8kukafkqx1a0880iczbrqym",
"title": "Entity type A",
"value": "entity_type_a",
"color": "#8000FF",
"version": 1,
"format": "text.location",
"data": {
"location": {
"start": 67,
"end": 128,
"text": "Hereditary Endotheliopathy..."
}
}
}
]
}
}
]

Was this page helpful?

Text classification

Contact