Questions tagged [model-evaluations]

This tag is meant to be used for questions related to how to evaluate a model performance, not only based on standard metrics, but also in the context of real use case applications. What is a good model might depend on many factors to take into account, to eventually get really useful data science applications.

354 questions
284
votes
8 answers

Micro Average vs Macro average Performance in a Multiclass classification setting

I am trying out a multiclass classification setting with 3 classes. The class distribution is skewed with most of the data falling in 1 of the 3 classes. (class labels being 1,2,3, with 67.28% of the data falling in class label 1, 11.99% data in…
SHASHANK GUPTA
  • 3,745
  • 4
  • 18
  • 26
52
votes
3 answers

What is the difference between bootstrapping and cross-validation?

I used to apply K-fold cross-validation for robust evaluation of my machine learning models. But I'm aware of the existence of the bootstrapping method for this purpose as well. However, I cannot see the main difference between them in terms of…
Fredrik
  • 967
  • 2
  • 9
  • 11
44
votes
10 answers

When is precision more important over recall?

Can anyone give me some examples where precision is important and some examples where recall is important?
Rajat
  • 1,017
  • 2
  • 9
  • 10
20
votes
4 answers

Train/Test Split after performing SMOTE

I am dealing with a highly unbalanced dataset so I used SMOTE to resample it. After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it. However, I am…
Edamame
  • 2,705
  • 5
  • 23
  • 32
19
votes
4 answers

Macro- or micro-average for imbalanced class problems

The question of whether to use macro- or micro-averages when the data is imbalanced comes up all the time. Some googling shows that many bloggers tend to say that micro-average is the preferred way to go, e.g.: Micro-average is preferable if there…
Krrr
  • 293
  • 1
  • 2
  • 6
16
votes
1 answer

How many features to sample using Random Forests

The Wikipedia page which quotes "The Elements of Statistical Learning" says: Typically, for a classification problem with $p$ features, $\lfloor \sqrt{p}\rfloor$ features are used in each split. I understand that this is a fairly good educated…
16
votes
1 answer

How to define a custom performance metric in Keras?

I tried to define a custom metric fuction (F1-Score) in Keras (Tensorflow backend) according to the following: def f1_score(tags, predicted): tags = set(tags) predicted = set(predicted) tp = len(tags & predicted) fp =…
Hendrik
  • 8,377
  • 17
  • 40
  • 55
13
votes
1 answer

Irregular Precision-Recall Curve

I'd expect that for a precision-recall curve, precision decreases while recall increases monotonically. I have a plot that is not smooth and looks funny. I used scikit learn the values for plotting the curve. Is the curve below abnormal? If yes, why…
Anderlecht
  • 251
  • 2
  • 7
12
votes
3 answers

Why is the F-measure preferred for classification tasks?

Why is the F-measure usually used for (supervised) classification tasks, whereas the G-measure (or Fowlkes–Mallows index) is generally used for (unsupervised) clustering tasks? The F-measure is the harmonic mean of the precision and recall. The…
Bruno Lubascher
  • 3,488
  • 1
  • 11
  • 35
12
votes
2 answers

Neural Networks - Loss and Accuracy correlation

I'm a bit confused by the coexistence of Loss and Accuracy metrics in Neural Networks. Both are supposed to render the "exactness" of the comparison of $y$ and $\hat{y}$, aren't they? So isn't the application of the two redundant in the training…
Hendrik
  • 8,377
  • 17
  • 40
  • 55
12
votes
3 answers

What are the disadvantages of accuracy?

I have been reading about evaluating a model with accuracy only and I have found some disadvantages. Among them, I read that it equates all errors. How could this problem be solved? Maybe assigning costs to each type of failure? Thank you very much…
9
votes
2 answers

Difference between using RMSE and nDCG to evaluate Recommender Systems

What kind of error measures do RMSE and nDCG give while evaluating a recommender system, and how do I know when to use one over the other? If you could give an example of when to use each, that would be great as well!
9
votes
3 answers

How do you evaluate ML model already deployed in production?

so to be more clear lets consider the problem of loan default prediction. Let's say I have trained and tested off-line multiple classifiers and ensembled them. Then I gave this model to production. But because people change, data and many other…
tomtom
  • 247
  • 3
  • 5
8
votes
1 answer

When do I have to use aucPR instead of auROC? (and vice versa)

I'm wondering if sometimes, to validate a model, it's not better to use aucPR instead of aucROC? Do these cases only depend on the "domain & business understanding" ? Especially, I'm thinking about the "unbalanced class problem" where, it seems…
8
votes
2 answers

Do I need validation data if my train and test accuracy/loss is consistent?

I am trying to understand the purpose of a 3rd split in the form of a validation dataset. I am not necessarily talking about cross-validation here. In the scenario below, it would appear that the model is overfit to the training dataset. Train…
1
2 3
23 24