Questions tagged [named-entity-recognition]

167 questions
22
votes
2 answers

NLP - Is Gazetteer a cheat?

In NLP, there is the concept of Gazetteer which can be quite useful for creating annotations. As far as I understand: A gazetteer consists of a set of lists containing names of entities such as cities, organisations, days of the week, etc. These…
AbtPst
  • 378
  • 1
  • 2
  • 9
10
votes
2 answers

what is BIO Tags for creating custom NER Named entity recognization?

I would like to create custom Named Entity Recognition (NER), but I am confused about what BIO Tags are. Could anyone please explain the steps for creating NER and about this B, I, O tag.
star
  • 1,411
  • 7
  • 18
  • 29
8
votes
2 answers

What should be the labels for subword tokens in BERT for NER task?

For any NER task, we need a sequence of words and their corresponding labels. To extract features for these words from BERT, they need to be tokenized into subwords. For example, the word 'infrequent' (with label B-count) will be tokenized into…
PinkBanter
  • 374
  • 3
  • 15
8
votes
1 answer

Text extraction from documents using NLP or Deep Learning

I am looking for references(Papers/github projects) on how to use deep learning in a text extraction task. Recently I was given a task to extract important information from documents of similar type, say for example legal merger documents. I have…
7
votes
1 answer

Named entity recognition (NER) features

I'm new to Named Entity Recognition and I'm having some trouble understanding what/how features are used for this task. Some papers I've read so far mention features used, but don't really explain them, for example in Introduction to the…
6
votes
1 answer

Named Entity Recognition: NLTK using Regular Expression

Many times Named Entity Recognition (NER) doesn't tag consecutive NNPs as one NE. I think editing the NER to use RegexpTagger also can improve the NER. For example, consider the following input: "Barack Obama is a great person." And the…
pg2455
  • 213
  • 2
  • 6
6
votes
1 answer

Improve NER label results on Non-English text

I am working on some Medieval Latin text and was using various methods of NER such as CLTK (Latin Model), Spacy (Multilingual, Italian, Spanish Model) and StanfordNER (Spanish Model). When I used the non-Latin models I used the original Latin text…
6
votes
1 answer

How does MITIE perform named entity recognition?

I'm trying to use MITIE to extract named entities from short text. I'm interested in entities such as dates, times, names, and locations. Out of the box, MITIE only recognises names, locations, and organisations. I'd like to train it to recognise…
5
votes
2 answers

Inter-Annotator Agreement score for NLP?

I have several annotators who annotated strings of text for me, in order to train an NER model. The annotation is done in json format, and it consists of a string followed by the start and end index of named entities, along with their respective…
Adnos
  • 81
  • 3
5
votes
1 answer

Is a BiLSTM layer required if we use BERT?

I am new to Deep learning based NLP and I have a doubt - I am trying to build a NER model and I found some journals where people are relying on BERT-BiLSTM-CRF model for it. As far as I know BERT is a language model that scans the contexts in both…
5
votes
1 answer

Is there a way to rank the Extracted Named Entities based on their importance/occurence in a document?

Looking for a way to rank the tens and hundreds of named entities present in any document in order of their importance/relevance in the context. Any thoughts ? Thanks in advance!
Neelam
  • 61
  • 1
4
votes
2 answers

NER on Twitter data

What are the best method/library/data available to extract named entities [Names and Location] from Twitter data ? [Other than dictionary lookup] I tried with Python-Stanford NER, But it seems to fail when named entities is not capitalized. I also…
Sreejithc321
  • 1,890
  • 3
  • 17
  • 32
4
votes
2 answers

Difference between IOB and IOB2 format?

I have to tag a dataset for NER. I came across conll2002/esp. What I understand so far, in IOB2 format if I want to tag 'Alex Larson is going to Los Angeles for a job interview with Candace Patrick' it'll be like: Alex B-PER Larson I-PER is O going…
Mahmudul Haque
  • 141
  • 1
  • 4
4
votes
1 answer

Is it a red flag that increasing the number of parameters makes the model less able to overfit small amounts of data?

I'm training a deep network (CNN-LSTM-CRF) for Named Entity Recognition. Is there a reason that increasing the number of parameters would make the network less able to overfit a small training set (~20 sentences), or does this indicate a serious bug…
Solveit
  • 141
  • 2
4
votes
2 answers

How many examples needed for named entity disambiguation?

If I want to build a named entity linking system for resumes using an ontology of occupations and skills about how many annotations would I need? The ontology has about 20,000 entities. As a lower bound I'm guessing I would need about 10 examples…
2daaa
  • 99
  • 3
1
2 3
11 12