2

I am trying to use Logistic Lasso to classify documents as 1 or 0.

I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing features by removing the mean and scaling to unit variance) on the matrices prior to Lasso, the model performance improves in both cases.

Is it acceptable to rescale the TF or TF-IDF matrix using StandardScaler prior to Logistic Lasso? Why or why not?

  • 1
    In general I'd say that you can do whatever works. TFIDF is not a very principled representation, it's more an experimental method. – Erwan Oct 20 '21 at 22:05

0 Answers0