Auto-Encoder to condense (pre-process) large one-hot input vectors?

Question

In my 3D game there are 300 categories to which a creature can belong. I would like to teach my RL agent to make decisions based on its 10 closest monsters

So far, my Neural Network input vector is a concatenation of ten 300-dimensional one-hot encoded vectors. It would be awesome if I had, say ten 40-dimensional vectors instead.

So, is it possible to pass a 300-dimensional one-hot encoded vector through an auto-encoder, to snatch its compressed version (embedding) from the middle of the auto-encoder?

This would allow me to concatenate several of such compressed embeddings as a "total" input (ten concatenated 40D vectors), without bloating this total input vector.

In my game, each one-hot is supposed to represents a distinct category. My categories are not correlating in any way.

1) Will it be valid, or will it introduce unwanted assumptions, like Label-Encoding does?

2) How to train this Auto-Encoder? Can I simply pass my one-hots through it, and fine tune until I get an exact one-hot on the other end of the Auto-encoder?

3) Is there a better way to describe several monsters?

If you want those "compressed representations" to be used as input to a second network trained on the real target task, why don't you use a normal embedding as first layer of sich a second network? — noe, Aug 07 '18 at 09:14
My motivation is to reduce number of weights of the final network, when its used during runtime. This reduces processing time spent during forward propagation and the storage size too. If I use compressed vectors as input, number of weights between inputLayer and firstHiddenLayer will be exponentially reduced because inputLayer is significantly lower dimension. — Kari, Aug 07 '18 at 09:45
My input vector must describe 10 creatures, each beloning to some category (300 possible categories). So input would be 10*300 dimensional, if I just use one-hot approach — Kari, Aug 07 '18 at 09:48
As I asked in my first comment, have you considered using a normal embedding layer as input to your network and train it together with the rest of the network? — noe, Aug 07 '18 at 13:19
I am not sure how to implement that - my current network accepts many 300-dimensional vectors at once, as a concatenation. I wouldn't want to create embeddings for that entire huge vector, because of amount of permutations in 10*300D vector is enormous compared to 10*40D vector — Kari, Aug 20 '18 at 17:17

score 4 · Accepted Answer · answered Aug 20 '18 at 17:48

It might be useful to think of this in terms of orthogonality. You state that "categories are not correlating in any way", which effectively means each category should be completely orthogonal to all the others -- otherwise one would be representing some degree of correlation (or anti-correlation) between at least two categories (to the degree that those categories are not purely orthogonal). Now, the only way to get 300 orthogonal vectors is in 300 dimensional space. There is no way to compress that without losing information.

Alternatively one can think in terms of using simpler dimension reduction techniques than autoencoders. Let's consider using PCA (which is, after all, just a kind of linear autoencoder with identity activation function). What does the input data look like? Presumably we have one observation for each creature, so 300 observations. We have 300 features. The result is a 300x300 identity matrix. The singular values of that are all identically 1, so PCA to anything less than 300 dimensions is just going to fail. Adding non-linear activations and extra layers is not going to fix that, it will just make it noisier via stochastic optimization so you may get a result -- it just isn't a meaningful one.

What you need if you want to compress things is some correlation structure amongst your creatures. If you can come up with a set of properties/features that creatures might have and then score all your creatures accordingly you may be able to do some dimension reduction on that -- but dimension reduction on completely orthogonal vectors just isn't a meaningful thing.

Imran · Answer 2 · 2018-08-21T06:28:20.737

Leland's answer is exactly correct regarding why an autoencoder wouldn't be useful. Let me expand upon that point:

Autoencoders and other dimensionality reduction techniques attempt to keep objects that are "close" together in your high dimensional space also close in the lower dimensional space. Often the measure of closeness the autoencoder learns leads to a compressed representation that is useful for later tasks, but sometimes it may not.

Your categories most likely exhibit structure in the context of your game, where certain categories behave like certain others in certain situations. So a dimensionality reduction could certainly be both possible and useful.

However, the problem is that the way you have represented your categories currently exhibits no special structure. Every one-hot encoded category is exactly the same distance from every other category, so an autoencoder could not possibly learn anything useful about how categories behave relative to one another.

This is where an embedding layer as mentioned by ncasas in the comments can come to the rescue. An embedding layer can take one-hot encoded categories and learn a lower-dimensional embedding optimized for the task at hand. In other words, it is trained at the same time as the rest of your architecture, so it learns the best embedding for minimizing the error at the output of your network, which represents your end goal.

Unfortunately, as you mentioned your architecture doesn't quite support this - since you need pre-existing embeddings which you'd like concatenate. If I am interpreting this correctly, you are saying you'd like to pass multiple categories in as part of one input to the network. In this case you should just use a multi-hot encoding as input to an embedding layer.

If I am not interpreting this correctly as you really need pretrained embeddings, then consider doing exactly that! If you can find a simpler prediction task that takes the same categories as input, then you can learn an embedding on this task, and then reuse the embedding for your more complex task. Reusing embeddings on related tasks is a well-tested method, and you can read more about it in the recent fast.ai post on categorical embeddings.

To generate training data for a simpler task you could consider simulating some simple behaviors in your environment that are good differentiators of how different monster classes behave. For example you could have a group of $k$ monsters compete at some task, and the goal is to predict who will win. For large enough $k$ you should be able to generate a large, non-repeating data set.

When you have learned an embedding for individual monsters, you can then take the average of the relevant embeddings as input to your network for your original task. This method is used in Deep Neural Networks for YouTube Recommendations, and probably elsewhere too.

Auto-Encoder to condense (pre-process) large one-hot input vectors?

2 Answers2

Linked