1

In the Alexnet model, after the encoder steps are completed, you end up with a 6x6x256 tensor. Now this needs to be flattened before we go to the ANN part of the network. However, the flattening results in a length of 4096. How did the size of the tensor reduce? In a few tutorials I read about these flatten steps, there is no loss of size when you flatten the tensor so I was expecting the length of the flattened vector to be 6 * 6 * 256 i.e. 9216. Why does Alexnet flatten end up with 4096 and not 9216 length?

The Alexnet paper does not go in to the details of the individual layers of the network.

Thanks

simplename
  • 111
  • 2
  • I googled it and surprisingly no one asked about this . All the articles just accepted the 6 * 6 * 256 as 1*4096 .. – Enes Kuz Dec 15 '21 at 20:37
  • 1
    in AlexNet, the input is an image of size 227x227x3. After Conv-1, the size ochanges to 55x55x96 which is transformed to 27x27x96 after MaxPool-1. After Conv-2, the size changes to 27x27x256 and following MaxPool-2 it changes to 13x13x256. Conv-3 transforms it to a size of 13x13x384, while Conv-4 preserves the size and Conv-5 changes the size back go 27x27x256. Finally, MaxPool-3 reduces the size to 6x6x256. This image feeds into FC-1 which transforms it into a vector of size 4096×1. The size remains unchanged through FC-2, and finally, we get the output of size 1000×1 after FC-3. – Enes Kuz Dec 15 '21 at 20:38
  • Yes but this doesn't answer my question -- how does fc1 convert it from 6x6x256 to 4096x1? 6x6x256 = 9216 – simplename Dec 16 '21 at 16:09
  • I know it doesnt answer. I simply provide a source that it is skipped how this change happens – Enes Kuz Dec 16 '21 at 19:31
  • I think the calculation might be off. Their pooling scheme is slightly different as far as I remember. – ashutosh singh Dec 17 '21 at 01:03

1 Answers1

0

First I also got confused but after having a look at some images of the model architecture the solution is quite clear:

a busy cat (source https://media5.datahacker.rs/2018/11/alexnet_ispravljeno.png)

The last convolutional layer with the output of 6x6x256 is unfolded to a 1D vector with n=9216 (as you already said correctly). BUT these 9216 neurons are identical to the neurons in the convolution block. These are not new neurons! the unfolding is just to better understand how each of these 9216 neurons is connected to our actual Fully Connected Layer with 4096 neurons.