In this page it is mentioned that when trainable=false, the weight won't be updated and is used for optimization, too. But I still do not understand how it can be useful? (e.g., I want to find out the best number of neurons or the best drop out rate, can it help?)
- 1,783
- 11
- 21
- 34
- 755
- 1
- 9
- 20
2 Answers
One common application is to freeze an embedding layer. Freezing this layer will prevent the embedding from updating its weight which can be a desirable thing, especially for a text embedding layer.
There also exist designs where updating a weight during a certain batch is not wanted. For example some GAN implementations only want to train a model during the combined model phase and therefore freezes the Generator and Discriminator layers for batches.
You can also see this used for stacked autoencoder where someone may train one layer at a time. Here is a quick link.
- 1,783
- 11
- 21
- 34
- 2,340
- 9
- 15
We use freezing to employ transfer learning. Deep learning has a great hunger for data. In some tasks you may not have so much data, but there may already be a pre-trained network that can be helpful. In such cases you use the model and its weights and by replacing the soft-max layer, in situations where you have small amount of data, you try to customize the network for you specific task. If you have more data, the more number of layers can be trainable. Take a look at here
- 13,868
- 9
- 55
- 98