1

I have an image dataset of size 500GiB, and my system specs are NVIDIA GEFORCE 930M, 12GB of RAM and Intel Core i5.

I have the following questions:

  1. Is it possible such a large dataset to be used in my local machine?
  2. If yes, How much time will be required for one epoch or equivalently on iteration? Any links or reference on how to compute the required processing time will be helpful.
  3. If my system is not good, what are the other possible solutions I have?
Green Falcon
  • 13,868
  • 9
  • 55
  • 98

1 Answers1

1

The large size of your data is acceptable for deep learning and big data projects. Your system is also acceptable, though it is not powerful. If you have enough hard disk to store them all, it will suffice which means you can train your network. The elapsed time for each epoch depends on multiple aspects. For instance, some elements which are important are, the batch size and your vectorized implementation, the bottle-neck between the disk and RAM, the bottle between RAM and GPU, the size of the model, the size of training data, the memory size of your GPU alongside the size of your RAM, the size of each data, the load which is imposed to your GPU by your OS, and so forth. The easiest way is to code your network and try it yourself.

As I've mentioned, by the current settings you can train your network, but you may not have very fast computation. However, you can use some techniques to faciliate your training phase as much as possible. For instance, you have two main bottle-necks. The first bottleneck, which exists between disk and RAM, can be dealt with using generators. Namely, you can employ generators to decrease the number of disk calls. The other bottle-neck, between RAM and GPU, can be handled using vectorized implementation of your neural network. After loading your network, you can find the appropriate batch size to use all available GPU memory.

I also want to point out that the current GPU you have may have space limitations. This can incur difficulties when your network is very large. In such cases, you won't be able to load your entire network to your GPU.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98