I'm not an expert in autoencoders or neural networks by any means, so forgive me if this is a silly question.
For the purpose of dimension reduction or visualizing clusters in high dimensional data, we can use an autoencoder to create a (lossy) 2 dimensional representation by inspecting the output of the network layer with 2 nodes. For example, with the following architecture, we would inspect the output of the third layer
$[X] \rightarrow N_1=100 \rightarrow N_2=25 \rightarrow (N_3=2) \rightarrow N_4=25 \rightarrow N_5=100 \rightarrow [X]$
where $X$ is the input data and $N_l$ is the number of nodes in the $l$th layer.
Now, my question is, why do we want a symmetrical architecture? Doesn't a mirror of the deep 'compression' phase mean we might have a similarly complex 'decompression' phase resulting in a 2 node output which is not forced to be very intuitive? In other words, wouldn't having a simpler decoding phase result in the output of the layer with 2 nodes necessarily being simpler too?
My thinking here is that the less complex the decompression phase, the simpler (more linear?) the 2D representation has to be. A more complex decompression phase would allow a more complex 2D representation.