I was just playing around with one-hot representations of features and thought of the following:
Say that we're having 4 categories for a given feature (e.g. fruit) {Apple, Orange, Pear, Melon}. In this case the one-hot encoding would yield:
Apple: [1 0 0 0]
Orange: [0 1 0 0]
Pear: [0 0 1 0]
Melon: [0 0 0 1]
The above means that we quadruple the feature space as we go from having one feature to having four.
This looks like it's wasting a few bits, as we can represent 4 values with $\log_{2}4=2$ bits/features:
Apple: [0 0]
Orange: [0 1]
Pear: [1 0]
Melon: [1 1]
Would there be a problem with this representation in any of the most common machine learning models?