Questions tagged [training]

Training is the part of machine learning whereby a model is "trained" on a define portion of a dataset to learn attributes and statistical features of the data. It's counterparts are called Testing and Validation. After training a model is tested and validated on another portion of the dataset.

Training is the part of machine learning whereby a model is "trained" on a define portion of a dataset to learn attributes and statistical features of the data. It's counterparts are called Testing and Validation. After training a model is tested and validated on another portion of the dataset.

675 questions
123
votes
2 answers

Training an RNN with examples of different lengths in Keras

I am trying to get started learning about RNNs and I'm using Keras. I understand the basic premise of vanilla RNN and LSTM layers, but I'm having trouble understanding a certain technical point for training. In the keras documentation, it says the…
Tac-Tics
  • 1,350
  • 2
  • 8
  • 6
63
votes
6 answers

Should a model be re-trained if new observations are available?

So, I have not been able to find any literature on this subject but it seems like something worth giving a thought: What are the best practices in model training and optimization if new observations are available? Is there any way to determine the…
yad
  • 1,773
  • 3
  • 16
  • 27
55
votes
5 answers

Is it always better to use the whole dataset to train the final model?

A common technique after training, validating and testing the Machine Learning model of preference is to use the complete dataset, including the testing subset, to train a final model to deploy it on, e.g. a product. My question is: Is it always…
pcko1
  • 3,910
  • 1
  • 14
  • 29
54
votes
4 answers

What is the advantage of keeping batch size a power of 2?

While training models in machine learning, why is it sometimes advantageous to keep the batch size to a power of 2? I thought it would be best to use a size that is the largest fit in your GPU memory / RAM. This answer claims that for some packages,…
James Bond
  • 1,155
  • 2
  • 11
  • 12
41
votes
8 answers

What would I prefer - an over-fitted model or a less accurate model?

Let's say we have two models trained. And let's say we are looking for good accuracy. The first has an accuracy of 100% on training set and 84% on test set. Clearly over-fitted. The second has an accuracy of 83% on training set and 83% on test set.…
37
votes
5 answers

In the context of Deep Learning, what is training warmup steps

I found the term "training warmup steps" in some of the papers. What exactly does this term mean? Has it got anything to do with "learning rate"? If so, how does it affect it?
Ashwin Geet D'Sa
  • 1,049
  • 1
  • 9
  • 19
35
votes
9 answers

Why is it wrong to train and test a model on the same dataset?

What are the pitfalls of doing so and why is it a bad practice? Is it possible that the model starts to learn the images "by heart" instead of understanding the underlying logic?
karalis1
  • 461
  • 1
  • 5
  • 8
21
votes
4 answers

Train, test split of unbalanced dataset classification

I have a model that does binary classification. My dataset is highly unbalanced, so I thought that I should balance it by undersampling before I train the model. So balance the dataset and then split it randomly. Is this the right way ? or should…
lads
  • 413
  • 1
  • 5
  • 8
21
votes
6 answers

Tool to label images for classification

Can anyone recommend a tool to quickly label several hundred images as an input for classification? I have ~500 microscopy images of cells. I would like to assign categories such as 'healthy', 'dead', 'sick' manually for a training set and save…
jlarsch
  • 401
  • 1
  • 3
  • 8
15
votes
1 answer

Is stratified sampling necessary (random forest, Python)?

I use Python to run a random forest model on my imbalanced dataset (the target variable was a binary class). When splitting the training and testing dataset, I struggled whether to used stratified sampling (like the code shown) or not. So far, I…
LUSAQX
  • 783
  • 2
  • 10
  • 24
11
votes
2 answers

Oversampling/Undersampling only train set only or both train and validation set

I am working on a dataset with class imbalance problem. Now, I know one needs to oversample or undersample only the train set and not the test set. But my issue is: whether to oversample the train set and then split it to train and validate set or…
yamini goel
  • 711
  • 3
  • 7
  • 14
10
votes
3 answers

How to split train/test datasets having equal classes proportion

I would like to know how I can split in an equal number the following Target 0 1586 1 318 in order to have the same proportion of 0 and 1 classes in a dataset to train, if my dataset is called df and includes 10 columns, both numerical and…
user105599
  • 155
  • 1
  • 1
  • 5
10
votes
2 answers

Train object detection without annotated data/bounding boxes

From what I can see most object detection NNs (Fast(er) R-CNN, YOLO etc) are trained on data including bounding boxes indicating where in the picture the objects are localised. Are there algos that simply take the full picture + label annotations,…
9
votes
1 answer

How to train data by batch from disk?

I am working on a convolutional neural network for image classification. The training dataset is too large to be loaded on my computer memory (4gb), on top of that I also need to try some augmentation to balance the classes. I am using keras. I have…
Learning is a mess
  • 646
  • 1
  • 8
  • 16
9
votes
3 answers

What knowledge do I need in order to write a simple AI program to play a game?

I'm a B.Sc graduate. One of my courses was 'Introduction to Machine Learning', and I always wanted to do a personal project in this subject. I recently heard about different AI training to play games such as Mario, Go, etc. What knowledge do I need…
1
2 3
44 45