NeuroNER

A Named-Entity Recognition Program
based on Neural Networks and Easy to Use

What is named-entity recognition (NER)?

Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities can be used in various downstream applications such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks.

What does NeuroNER do?

NeuroNER is a program that performs NER:

NeuroNER engine

NeuroNER presents the following advantages over the existing NER systems:

Leverages the state-of-the-art prediction capabilities of neural networks (a.k.a. "deep learning")
Enables the users to create or modify annotations for a new or existing corpus
Is cross-platform, open source, freely available, and straightforward to use

Where can NeuroNER be downloaded?

NeuroNER runs on Linux, Mac OS X, and Microsoft Windows. It requires Python 3.5, TensorFlow 1.0, and scikit-learn. NeuroNER's code can be found here: https://github.com/Franck-Dernoncourt/NeuroNER

How to use NeuroNER?

The diagram below presents an overview of NeuroNER. NeuroNER can be used as follows:

Train the neural network that performs the NER. During the training, NeuroNER allows to monitor the network
Evaluate the quality of the predictions made by NeuroNER. The performance metrics can be calculated and plotted by comparing the predicted labels with the gold labels. The evaluation can be done at the same time as the training if the test set is provided along with the training and validation sets, or separately after the training or using a pre-trained model
Deploy NeuroNER for production use: NeuroNER labels the deployment set, i.e. any new text without gold labels.

How does the NeuroNER engine work?

The NeuroNER engine is based on artificial neural networks (ANNs). Specifically, it relies on a variant of recurrent neural network (RNN) called long short-term memory (LSTM). The NER engine's ANN contains three layers:

Character-enhanced token-embedding layer,
Label prediction layer,
Label sequence optimization layer.

The following diagram presents the architecture of the ANN used in the NeuroNER engine.

How to install NeuroNER?

The GitHub repository explains the installation instructions. Here is a demo showing how easy it is when using the installation script on Ubuntu: the script installs everything you need and start training on the CoNLL-2003 dataset. After a few training epochs, one obtains state-of-the-art results.