Questions tagged [validation]

61 questions
3
votes
1 answer

Does validation data has any effect on training or it acts solely without affecting the training?

When using Keras library of Python, we use validation data with training data while training our model. In every epoch, we get a validation accuracy. Does this validation accuracy have any effect on training on the next epoch?
3
votes
1 answer

Why do machine learning engineers insist on training with more data than validation set?

Among my colleagues I have noticed a curious insistence on training with, say, 70% or 80% of data and validating on the remainder. The reason it is curious to me is the lack of any theoretical reasoning, and it smacks of influence from a five-fold…
2
votes
0 answers

ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). Any advice?

The error below presented itself when attempting to assemble a PCA. My code: is_float = X.dtype.kind in 'fc' if is_float and (np.isfinite(_safe_accumulator_op(np.sum, X))): pass elif is_float: msg_err = "Input contains {}…
2
votes
0 answers

Validation set after hyperparameter tuning

Let's say I'm comparing few models, and for my dataset I'm using train/validation/test split, and not cross validation. Let's say I'm completely done with parameter tuning for one of them and want to evaluate on the test set. Will I train a new…
2
votes
1 answer

Using the whole dataset for testing (not validation) in case of small datasets

for an object detection task I created a small dataset to train an object detector. The class frequency is more or less balanced, however I defined some additional attributes with environmental conditions for each image, which results in a rather…
2
votes
1 answer

Validation and training loss of a model are not stable

Below I have a model trained and the loss of both the training dataset (blue) and validation dataset (orange) are shown. From my understanding, the ideal case is that both validation and training loss should converge and stabilize in order to tell…
Avv
  • 231
  • 2
  • 9
2
votes
1 answer

How to build a model when we have three separate train, validation, and test sets?

I have a data set which should be divided into train, test, and validation sets. set.seed(98274) # Creating example data y <- sample(c(0,1), replace=TRUE, size=500) x1 <- rnorm(500) + 0.2 * y x2 <- rnorm(500) + 0.2 * x1 +…
ebrahimi
  • 1,277
  • 7
  • 20
  • 39
2
votes
3 answers

What is exactly the difference between Validation data and Testing data

I asked this question on stack overflow and was told that this is a better place for it. I am confused with the terms validation and testing, is validating the model same as testing it? is it possible to use testing data for validation? what even…
besa
  • 23
  • 5
2
votes
2 answers

Dataset and why use evaluate()?

I am starting in Machine Learning, and I have doubts about some concepts. I've read we need to split our dataset into training, validation and test sets. I'll ask four questions related to them. 1 - Training set: It is used in .fit() for our model…
Murilo
  • 125
  • 3
1
vote
3 answers

Spliting Training Test and Validation for Image Dataset

I have 600 images in the training folder, 200 images in the validation folder, and 200 images in the test folder. Suppose if I fit the training data generator and validation data generator for some epochs for learning purposes -…
1
vote
1 answer

Measure performance of classification model for training on different snapshots

I am trying to do binary classification on some chronological data. Let's assume we have weekly data from the first week of 2017 through the last week of 2020. Now we have found out that 26 weeks of training data might be sufficient for doing…
1
vote
1 answer

Using Z-test score to evaluate model performance

I think I know the answer to this question but I am looking for a sanity check here: Is it appropriate to use z-test scores in order to evaluate the performance of my model? I have a binary model that I have developed with a NN in Keras. I know the…
I_Play_With_Data
  • 2,079
  • 2
  • 16
  • 39
1
vote
2 answers

dataset split for image classification

I am trying to do image classification for 14 categories (around 1000 images for each cat). And i initially created two folders for training and validation. In this case, do I still need to set a validation split or a subset in a code? or I can use…
1
vote
1 answer

Does overfitting depend only on validation loss or both training and validation loss?

There are several scenarios that can occur while training and validating: Both training loss and validation loss are decreasing, with the training loss lower than the validation loss. Both training loss and validation loss are decreasing, with the…
mhdadk
  • 131
  • 5
1
vote
1 answer

Is it right to maintain the train distribution in test set for unbalanced data?

If the training set was unbalanced the chances are the model will be biased. But if the data distribution in the test set is the same distribution as the train set, this kind of bias is not going to affect validation accuracy. But my question is if…
1
2 3 4 5