For pretrained models, spaCy has a few in different languages. You can find them in their official documentation https://spacy.io/models
The available models are:
- English
- German
- French
- Spanish
- Portuguese
- Italian
- Dutch
- Greek
- Multi-language
If you want support for extra labels in NER, you could train a model in your own dataset. Again, this is possible in spaCy and from their official documentation https://spacy.io/usage/training#ner, here is an example
LABEL = "ANIMAL"
TRAIN_DATA = [
(
"Horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("Do they bite?", {"entities": []}),
(
"horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
(
"they pretend to care about your feelings, those horses",
{"entities": [(48, 54, LABEL)]},
),
("horses?", {"entities": [(0, 6, LABEL)]}),
]
nlp = spacy.blank("en") # create blank Language class
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
ner.add_label(LABEL) # add new entity label to entity recognizer
optimizer = nlp.begin_training()
move_names = list(ner.move_names)
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes): # only train NER
sizes = compounding(1.0, 4.0, 1.001)
# batch up the examples using spaCy's minibatch
for itn in range(n_iter):
random.shuffle(TRAIN_DATA)
batches = minibatch(TRAIN_DATA, size=sizes)
losses = {}
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
print("Losses", losses)
If you want to use an existing model and also add a new custom Label, you can read the linked article in their documentation where they describe the process in details. Actually, it is quite similar to the code above.