Questions tagged [one-hot-encoding]
136 questions
8
votes
3 answers
What is the difference between one-hot and dummy encoding?
I am trying to understand
The reason behind encoding (one-hot encoding and dummy encoding)
How one-hot and dummy are different from each other
user121028
8
votes
5 answers
How do I encode the categorical columns if there are more than 15 unique values?
I'm trying to use this data to make a data analysis report using regression. Since regression only allows for numerical types, I then need to encode the categorical data. However, most of these have more than 15 unique values such as country.
Do I…
Cinemato
- 81
- 1
- 2
7
votes
2 answers
Possible harm in standardizing one-hot encoded features
While there may not be any added value in standardizing one-hot encoded features prior to applying linear models, is there is any harm in doing so (i.e., affecting model performance)?
Standardizing definition: applying (x - mean) / std to make the…
thereandhere1
- 715
- 1
- 7
- 22
5
votes
1 answer
When to One-Hot encode categorical data when following Crisp-DM
I have a dataset that contains 15 categorical features (2 and 3 level factors which are non-ordinal) and 3 continuous numeric features. Seeing as most machine learning algorithms require numerical data as input features, and actually automatically…
kjtheron
- 53
- 4
5
votes
1 answer
One Hot Encoding for any kind of dataset
How can I make a one hot encoding for a unknown dataset which can iterate and check the dytype of the dataset and do one hot encoding by checking the number of unique values of the columns, also how to keep track of the new one hot encoded data with…
Devansh Mishra
- 63
- 4
5
votes
1 answer
Difference between tf.keras.backend.one_hot and keras.utils.to_categorical
I'm working on a classification project and need to do one hot encoding on my data set. I'm just wondering what is the difference between tf.keras.backend.one_hot and keras.utils.to_categorical, and is one of them preferred over the other?
kimchilover123
- 51
- 2
5
votes
2 answers
How to handle categorical variables with Random Forest using Scikit Learn?
One of the variables/features is the department id, which is like 1001, 1002, ..., 1218, etc. The ids are nominal, not ordinal, i.e., they are just ids, department 1002 is by no means higher than department 1001. I feed the feature to random forest…
Fred Chang
- 85
- 1
- 6
4
votes
1 answer
Should I do one hot encoding before feature selection and how should I perform feature selection on a dataset with both categorical and numerical data
a newbie here. I am currently self-learning data science. I am working on a dataset that has both categorical and numerical (continuous and discrete) features (26 columns, 30244 rows). Target is numerical (1, 2, 3). I have several questions.
I…
leahnanno
- 73
- 1
- 4
4
votes
1 answer
On gradient boosting and types of encodings
I am having a look at this material and I have found the following statement:
For this class of models [Gradient Boosting Machine algorithms] [...] it is both safe and significantly
more computationally efficient use an arbitrary integer encoding…
carlo_sguera
- 141
- 3
3
votes
1 answer
One hot encoding of target variable containing classes 1 to 9 not including zero
While predicting a solution for a sudoku puzzle using CNN, the target variable should predict values from 1 to 9 for all the 81(9*9) values in the puzzle. Hence the target value shape is (81,9).
Using keras.to_categorical to convert target variable…
Sathish Kumar SG
- 31
- 2
3
votes
2 answers
Treating missing data in categorical features
I have a dataset with one of the categorical columns having a considerable number of missing values. The interesting thing about this column is that it has values only for a particular category in "another" column .
For eg :
column 1 …
Bharathi
- 277
- 6
- 15
3
votes
1 answer
Encoding and cross-validation
Recently I've been thinking about the proper use of encoding within cross-validation scheme. The customarily advised way of encoding features is:
Split the data into train and test (hold-out) set
Fit the encoder (either LabelEncoder or…
jakes
- 95
- 12
3
votes
1 answer
Does Fasttext use One Hot Encoding?
In the original Skipgram/CBOW both context word and target word are represented as one-hot encoding.
Does fasttext also use one-hot encoding for each subword when training the skip-gram/CBOW model (so the length of the one-hot encoding vector is…
malioboro
- 83
- 9
3
votes
2 answers
Beyond one-hot encoding for LSTM model in Keras
I have an LSTM model in Keras for categorical classification (20 possible categories). In many cases, my data can fit multiple categories.
Obviously, my current model uses one-hot encoding and fits on that - that gives me accuracy and validation…
I_Play_With_Data
- 2,079
- 2
- 16
- 39
3
votes
3 answers
How to obtain original feature names after using one-hot encoding
This question is on an implementation aspect of scikit-learn's DecisionTreeClassifier().
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the scikit-learn DecisionTreeClassifier()?
The problem is…
S Datta
- 51
- 6