7

I want to know if my thinking is correct?

Total Images = 120,000

So in the Keras Fit Generator function, I made the batch_size = 24 and steps per epoch = 500 ,which is 12000 and only one-tenth of the total dataset.

Therefore I should make the "epochs = 100" if I actually want the real epoch to be 10.

i.e. It will actually take 10 epochs to go through the entire 120,000 images once.

Son kun
  • 71
  • 2

2 Answers2

6

Your assumption is in part correct, as your model will see 120000 images (i.e. the size of the whole dataset) in 10 epochs.

However, because keras' default generators shuffle the dataset at the end of each epoch, it is very unlikely that the model will see each image once after 10 epochs.

My suggestion would be to set steps_per_epoch=5000.

Jerome
  • 346
  • 1
  • 3
1

From the relevant Keras documnetation:

steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of samples of your dataset divided by the batch size. Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

So your assumptions are correct.

If you want to use all your data in each epoch, you should chose a batch_size and steps_per_epoch that multiply together to give your total number of samples.

Usually it will be your resources that decide this for you. If memory is an issue, you have to reduce batch-size until you can fit a batch onto a GPU (as an example).

In your case, I would probably set batch_size to the desired amount, then let Keras work out step_per_epoch for you! Only change it if you really want the model to not use all data in each epoch (which actually bends the definition of the word "epoch").

n1k31t4
  • 14,663
  • 2
  • 28
  • 49
  • Only to clarify, it will see all of the data in 10 epochs right ?? – Son kun Nov 15 '18 at 20:04
  • @Sonkun - that would be my expectation. Why not try experimenting? Use just 15 samples as input, set `batch_size` to 3 and number of `steps_per_epoch` to 4, then try to see if only 12 of the samples are used in total. – n1k31t4 Nov 15 '18 at 21:11