WikiNEuRal is a high-quality automatically-generated dataset for Multilingual Named Entity Recognition.
In a nutshell, WikiNEuRal consists in a novel technique which builds upon a multilingual lexical knowledge base (i.e., BabelNet) and transformer-based architectures (i.e., BERT) to produce high-quality annotations for multilingual NER. It shows consistent improvements of up to 6 span-based F1-score points against state-of-the-art alternative data production methods on common benchmarks for NER. We used this methodology to automatically generate training data for NER in 9 languages. Learn more about the dataset here.