1

Some languages have word endings with their nouns (like Finnish, e.g. "in Berlin" -> "Berliinissä"). I have tried to annotate the characters in the training data as entities, but then I run the model, it doesn't detect the characters inside the word. When those characters are a separate word, only then they're detected. I am unable to think of an implementation to effectively detect named entities within a word. Any suggestions would be helpful.

  • 1
    Can you add couple of samples of such sentences in German ? – Shamit Verma Feb 22 '19 at 09:52
  • I was trying to detect "London" from "Londonschlüssel". But since my german is not so good, I later realized that it would be appropriate as "London-schüssel", which can be easily tokenized. – Hasan Shaukat Feb 22 '19 at 10:45

1 Answers1

1

I would recommend to look into character level named entity recognition. For example: Kuru et al, CharNER: Character-Level Named Entity Recognition, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)

The authors evaluate on many highly inflected languages including Turkish, so this should be adequate for your Finnish use case

The code is here: https://github.com/ozanarkancan/char-ner

You should hopefully be able to download and get it running out of the box for training. Of course I am assuming you have a tagged NER corpus in Finnish, which you would need to preprocess to get into the same format as the CSV file that they use for Czech in the repo.

Tom
  • 141
  • 5