How to train a spacy model for text classification?

Question

Can i know the way or steps to train a spacy model for text classification. (binary classification in my case)

Please help me with the process and way to approach.

Alexis Pister · Answer 1 · 2019-07-18T14:28:52.217

You have several good tutorials on the web :

https://www.kaggle.com/poonaml/text-classification-using-spacy

Basically, you have to :

Import the data in python, here POSITIVE is the variable to predict and 0 and 1 are the 2 encoded classes.

TRAIN_DATA = [(Text1, {'cats': {'POSITIVE': 1}}),
(Text2, {'cats': {'POSITIVE': 0}})]

Initialize a textcat pipe in a spacy pipeline object (nlp), and add the label variable in it.

nlp = spacy.load('en_core_web_sm')
if 'textcat' not in nlp.pipe_names:
  textcat = nlp.create_pipe("textcat")
  nlp.add_pipe(textcat, last=True) 
else:
  textcat = nlp.get_pipe("textcat")

textcat.add_label('POSITIVE')

Iterate the training examples to optimize the model

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']

n_iter = 1

# Only train the textcat pipe
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    print("Training model...")
    for i in range(n_iter):
        losses = {}
        batches = minibatch(train_data, size=compounding(4,32,1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer,
                      drop=0.2, losses=losses)

Thanks, I have already gone through the kaggle link before, but i did not find it useful. Here in my case all i need is Binary classification i.e. whether a document comes under two different categories ( tagged and no tagged. tagged means a set of keywords in it) . I just want to know how to implement this. — krishna rao gadde, Jul 18 '19 at 13:53
It is the same implementation for binary classification or multiclass classification, spaCy use only one type of model for text classification. — Alexis Pister, Jul 18 '19 at 14:12

How to train a spacy model for text classification?

1 Answers1