A Named-Entity Recognition Program based on Neural Networks and Easy to Use
What is named-entity recognition (NER)?
Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities can be used in various downstream applications such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks.
What does NeuroNER do?
NeuroNER is a program that performs NER:
NeuroNER presents the following advantages over the existing NER systems:
Leverages the state-of-the-art prediction capabilities of neural networks (a.k.a. "deep learning")
Enables the users to create or modify annotations for a new or existing corpus
Is cross-platform, open source, freely available, and straightforward to use
Where can NeuroNER be downloaded?
NeuroNER runs on Linux, Mac OS X, and Microsoft Windows. It requires Python 3.5, TensorFlow 1.0, and scikit-learn. NeuroNER's code can be found here: https://github.com/Franck-Dernoncourt/NeuroNER
How to use NeuroNER?
The diagram below presents an overview of NeuroNER. NeuroNER can be used as follows:
Train the neural network that performs the NER. During the training, NeuroNER allows to monitor the network
Evaluate the quality of the predictions made by NeuroNER. The performance metrics can be calculated and plotted by comparing the predicted labels with the gold labels. The evaluation can be done at the same time as the training if the test set is provided along with the training and validation sets, or separately after the training or using a pre-trained model
Deploy NeuroNER for production use: NeuroNER labels the deployment set, i.e. any new text without gold labels.
How does the NeuroNER engine work?
The NeuroNER engine is based on artificial neural networks (ANNs). Specifically, it relies on a variant of recurrent neural network (RNN) called long short-term memory (LSTM).
The NER engine's ANN contains three layers:
Character-enhanced token-embedding layer,
Label prediction layer,
Label sequence optimization layer.
The following diagram presents the architecture of the ANN used in the NeuroNER engine.
How to install NeuroNER?
The GitHub repository explains the installation instructions. Here is a demo showing how easy it is when using the installation script on Ubuntu: the script installs everything you need and start training on the CoNLL-2003 dataset. After a few training epochs, one obtains state-of-the-art results.
Using NeuroNER with BRAT
NeuroNER integrates with BRAT so that the user may easily view, amend or create annotations:
Using NeuroNER with TensorBoard
In addition to the plots generated by NeuroNER, one can use TensorBoard to analyze NeuroNER network and results in real-time or retrospectively:
The user may also view the NeuroNER engine interactively: