How do I train an RBM on color images?

Question

I am having a hard time understanding the strategy for inputting the color. Most tutorials on RBMs only train grayscale images.

If the image is grayscale, the input units can be binary, and I can normalize the gray scale value to [0,1], and then treat them like probabilities in the input layer. Or whiten the dataset and use Gaussian units in the input layer.

How do I treat color images? Obviously, the input units cannot be binary - unless I replicate the units for each of the three color channels? Or what is the popular strategy?

score 1 · Answer 1 · edited Oct 10 '21 at 16:32

1

I am working on a similar project. The idea is still under experiment so I do not guarantee results. You may convert your images to grey scaled images having 8 levels, then you can use these 8 levels as a visible layer connected to hidden layer having (width * height) of your images, then use this hidden layer as a visible layer to a hidden layer with the number of your classes.

edited Oct 10 '21 at 16:32

Ethan

1,625
8
23
39

answered Oct 10 '21 at 11:22

moumenShobakey

111
2

score 0 · Answer 2 · answered Sep 07 '22 at 05:20

If your visible units are real-valued then your conditionals change a bit.

Like shown in (Lee, Honglak and Ekanadham, Chaitanya and Ng, Andrew, 2007):

P(v_i}=1 | h) = Ν(c_i + Σ_j W_i,j . h_j , σ²) --> Where N(mean, sttdev) is a Gaussian distribution.

P(h_j}=1 | v) = σ( (b_j + Σ_i W_i,j . v_i) / σ² ) --> Where σ is sigmoid function

σ² is a hyperparameter (working as standard deviation here)

PS: Is not here, but your h should sample from a Bernoulli distribution (since it is a binary layer) during Gibbs Sampling.

With this you keep your visible unit real-valued, but be careful sampling from a Gaussian dist. after a reconstruction will get you values < 0 and > 1.

So a normalization of visible units is needed to get reasonable results.

I didn't tested much with RBMs because they don't work with filters and don't preserve spatial features since they hidden layers are, basically, a Dense layer.

Probably will be better to work on Convolutional RBMs where you can transform a RGB image (3 channels) into a binary hidden layer with more than 3 channels/filters trying to represent your "colors".

PS: In these type of RBMs your conditionals will not change much, just replace matrix multiplication for a convolution (in the second will be a transposed convolution)

I'm writing some code here to work with RBMs, CRBMs, Deep Belief Networks using Tensorflow 2 so if you or someone else needs help with code and/or theory you can check the repository and ask me below on comments.

How do I train an RBM on color images?

2 Answers2