Questions tagged [encoding]

Encoding in machine learning and data science refers to the process by which non-numeric data is transformed into a numeric representation that can be fed into machine learning algorithms.

Encoding in machine learning and data science refers to the process by which non-numeric data is transformed into a numeric representation that can be fed into machine learning algorithms. An example is one-hot-encoding where categorical labels are transformed into a numeric format consisting of ones and zeros.

179 questions

votes

4 answers

What is the positional encoding in the transformer model?

I'm trying to read and understand the paper Attention is all you need and in it, there is a picture: I don't know what positional encoding is. by listening to some youtube videos I've found out that it is an embedding having both meaning and…

asked Apr 28 '19 at 14:43

Peyman

1,073
1
7
8

votes

2 answers

Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy)

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integers. Additionally, when is one better than the…

neural-network keras loss-function encoding

asked Dec 01 '18 at 06:28

Master M

votes

4 answers

Difference between OrdinalEncoder and LabelEncoder

I was going through the official documentation of scikit-learn learn after going through a book on ML and came across the following thing: In the Documentation it is given about sklearn.preprocessing.OrdinalEncoder() whereas in the book it was given…

machine-learning python scikit-learn preprocessing encoding

asked Oct 07 '18 at 18:55

Saurabh Singh

votes

6 answers

Encoding features like month and hour as categorial or numeric?

Is it better to encode features like month and hour as factor or numeric in a machine learning model? On the one hand, I feel numeric encoding might be reasonable, because time is a forward progressing process (the fifth month is followed by the…

machine-learning feature-extraction feature-engineering encoding numerical

asked Mar 22 '17 at 07:43

Funkwecker

votes

6 answers

In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it?

While reviewing the Transformer architecture, I realized something I didn't expect, which is that : the positional encoding is summed to the word embeddings rather than concatenated to…

nlp encoding transformer attention-mechanism

asked Jul 18 '19 at 08:34

FremyCompany

votes

3 answers

How to deal with string labels in multi-class classification with keras?

I am newbie on machine learning and keras and now working a multi-class image classification problem using keras. The input is tagged image. After some pre-processing, the training data is represented in Python list as: [["dog",…

machine-learning scikit-learn tensorflow keras encoding

asked Mar 11 '17 at 13:42

Dracarys

votes

4 answers

One hot encoding alternatives for large categorical values

I have a data frame with large categorical values over 1600 categories. Is there any way I can find alternatives so that I don't have over 1600 columns? I found this interesting link. But they are converting to class/object which I don't want. I…

machine-learning dataset dataframe dimensionality-reduction encoding

asked Nov 14 '17 at 17:20

vinaykva

votes

2 answers

One Hot Encoding vs Word Embedding - When to choose one or another?

A colleague of mine is having an interesting situation, he has quite a large set of possibilities for a defined categorical feature (+/- 300 different values) The usual data science approach would be to perform a One-Hot Encoding. However, wouldn't…

preprocessing word-embeddings embeddings encoding

asked Apr 03 '18 at 14:13

Jonathan DEKHTIAR

votes

2 answers

Why does frequency encoding work?

Frequency encoding is a widely used technique in Kaggle competitions, and many times proves to be a very reasonable way of dealing with categorical features with high cardinality. I really don't understand why it works. Does it work in very…

machine-learning feature-engineering categorical-data encoding

asked Nov 25 '19 at 15:36

David Masip

5,981
2
23
61

votes

1 answer

What is the difference between global and universal compression methods?

I understand that compression methods may be split into two main sets: global local The first set works regardless of the data being processed, i.e., they do not rely on any characteristic of the data, and thus need not to perform any…

classification algorithms encoding

asked Jun 18 '14 at 15:27

Rubens

4,097
5
23
42

votes

2 answers

Always drop the first column after performing One Hot Encoding?

Since one of the columns can be generated completely from the others, and hence retaining this extra column does not add any new information for the modelling process, would it be good practice to always drop the first column after performing One…

machine-learning dataset feature-selection categorical-data encoding

asked Feb 27 '18 at 12:28

Gale

votes

3 answers

What is the difference between one-hot and dummy encoding?

I am trying to understand The reason behind encoding (one-hot encoding and dummy encoding) How one-hot and dummy are different from each other

encoding one-hot-encoding

asked Jul 22 '21 at 06:53

user121028

votes

5 answers

How do I encode the categorical columns if there are more than 15 unique values?

I'm trying to use this data to make a data analysis report using regression. Since regression only allows for numerical types, I then need to encode the categorical data. However, most of these have more than 15 unique values such as country. Do I…

regression categorical-data encoding one-hot-encoding categorical-encoding

asked Dec 24 '20 at 20:11

Cinemato

votes

1 answer

Encoding with OrdinalEncoder : how to give levels as user input?

I am trying to do ordinal encoding using: from sklearn.preprocessing import OrdinalEncoder I will try to explain my problem with a simple dataset. X = pd.DataFrame({'animals':['low','med','low','high','low','high']}) enc =…

machine-learning scikit-learn data-cleaning preprocessing encoding

asked Apr 15 '20 at 00:25

Ayush Ranjan

votes

2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…

machine-learning algorithms encoding dummy-variables

asked Mar 19 '19 at 19:55

Dan Chaltiel

2 3

…

11 12 Next