Is there any way to define custom entities in Spacy

Question

1) I have just started working on NLP the basic Idea is to extract meaningful information from text. For this I am using "Spacy".

As far as I have studied Spacy has following entities.

ORG
PERSON
DATE
MONEY
CARDINAL

etc. But I want to add custom entities like:

Nokia-3310 should be labeled as Mobile and XBOX should be labeled as Games

2) Can I find some already trained models in Spacy to work on ?

score 6 · Accepted Answer · answered Aug 16 '19 at 14:44

For pretrained models, spaCy has a few in different languages. You can find them in their official documentation https://spacy.io/models

The available models are:

English
German
French
Spanish
Portuguese
Italian
Dutch
Greek
Multi-language

If you want support for extra labels in NER, you could train a model in your own dataset. Again, this is possible in spaCy and from their official documentation https://spacy.io/usage/training#ner, here is an example

LABEL = "ANIMAL"

TRAIN_DATA = [
    (
        "Horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("Do they bite?", {"entities": []}),
    (
        "horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
    (
        "they pretend to care about your feelings, those horses",
        {"entities": [(48, 54, LABEL)]},
    ),
    ("horses?", {"entities": [(0, 6, LABEL)]}),
]


nlp = spacy.blank("en")  # create blank Language class
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)

ner.add_label(LABEL)  # add new entity label to entity recognizer

optimizer = nlp.begin_training()

move_names = list(ner.move_names)
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

with nlp.disable_pipes(*other_pipes):  # only train NER
    sizes = compounding(1.0, 4.0, 1.001)
    # batch up the examples using spaCy's minibatch
    for itn in range(n_iter):
        random.shuffle(TRAIN_DATA)
        batches = minibatch(TRAIN_DATA, size=sizes)
        losses = {}
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
        print("Losses", losses)

If you want to use an existing model and also add a new custom Label, you can read the linked article in their documentation where they describe the process in details. Actually, it is quite similar to the code above.

Thanks for the reply a quick question: It creates a blank class 'en' for entity recognition I am using "en_core_web_sm". Does this piece of code trains the "en_core_web_sm" ? — AddyProg, Aug 19 '19 at 07:19
No. As I mentioned, this creates an empty model that you will train. If you want to take the model `en_core_web_sm` and add your own entities on top of that, it's again quite easy. Just need to add a few extra lines on the above. It's there on the documentation I linked on the answer. — Tasos, Aug 19 '19 at 07:40

Is there any way to define custom entities in Spacy

1 Answers1