Questions tagged [target-encoding]

17 questions
5
votes
1 answer

Target encoding with KFold cross-validation - how to transform test set?

Let's say I have a categorical feature (cat): import random import pandas as pd from sklearn.model_selection import train_test_split, StratifiedKFold random.seed(1234) y = random.choices([1, 0], weights=[0.2, 0.8], k=100) cat = random.choices(["A",…
2
votes
1 answer

Categories with the same mean in target encoding

While doing target encoding it can happen that two categories have the same target mean. This is bad because there will be no difference in the new feature in it and we will lose some information. Also, this is potentially harmful to the model,…
2
votes
1 answer

Does it make sense to impute the target variable when there are a few missing target values in my dataset?

I'm relatively new to machine learning and just trying to have a solid understanding of the basics. I scraped a real estate/house prices dataset off a website, which I am in the process of cleaning before modelling. I noticed that out of 1,998…
Baker Hans
  • 21
  • 2
2
votes
1 answer

How do we target-encode categorical features in multi class classification problems?

Say I have a multiclass problem with a dataset as this: user_id price target -------+--------+----- 1 30 apple 1 20 samsung 2 32 samsung 2 40 huawei . . where I have a lot of users i.e One Hot…
CutePoison
  • 450
  • 2
  • 8
2
votes
1 answer

Predict apartment prices with two sources of prices

I am asking for help with the following problem. There are two subsamples in the dataset - one where the target is real(valid), and the other where it is approximate (I do not know how it differs yet, on one sample the real price of an apartment,…
1
vote
1 answer

Interpreting decision tree results after target encoding

I am not sure how to interpret the results of my decision tree after I had used target encoding, could someone clarify? The example below doesn't need target encoding just for explanation of my confusion here. For instance I am trying to classify if…
bob2
  • 13
  • 2
1
vote
0 answers

Does it make sense to use target encoding together with tree-based models?

I'm working on a regression problem with a few high-cardinality categorical features (Forecasting different items with a single model). Someone suggested to use target-encoding (mean/median of the target of each item) together with xgboost. While I…
1
vote
1 answer

One-Hot-Encoding Target variable

I have a dataset that consists of 4 values in a target variable. I have performed Ordinal Encoding over that which worked for me but my question here's that if I apply one-hot encoding can I solve this problem?. As it would be 4 new columns that are…
0
votes
1 answer

target encoding with multiple columns

I'm attempting to do target encoding with multiple columns on a dataframe and I'm getting an error message I don't understand. Here is a fragment of the code. X['District Code Encoded'] = encoder.fit_transform(X['District Code'], y) X['Property id…
Rupert
  • 101
  • 2
0
votes
1 answer

why should i do target encoding within cv loop?

i wish to use target encoding, using the category encoders sklearn library. I don't really understand why it is necessary to include this as a step in a sklearn pipeline WITHIN the cross validation loop? e.g. this example here does so Target…
0
votes
1 answer

Logistic Regression Multi-level Independent variables

im trying to study logistic regression, when i did the target variable with all features, i had the summary showing the p-values as usual, but one for the features has 60 level, another feature has 13 level, so how can i proceed with this kind of…
0
votes
0 answers

Using transformer to predict permutation of tokens based on encoded input

i am trying to implement a transformer based agent for a tile placing board game, and i was thinking about this kind of architecture: Src input is embedded board, with an empty block (for example 10x10 with a 3x3 block missing, so 3*3=9 tiles to be…
0
votes
0 answers

Do you need to normalize labels for models other than neural nets?

As mentioned here, normalizing the target variable often helps a neural network converge faster. Does it help in convergence, or is there otherwise a reason to use it, for any type of model other than neural nets?
0
votes
0 answers

Multi-class classification model unable to return desired outcome

I have a scenario of multi-class dataset with around 10 distinct classes of target. There are 3 categorical features each with multiple labels. If we check the data, each unique combination of feature values will always return single class of…
0
votes
1 answer

Dealing with observation with arbitrary number of categories with arbitary number of values

Suppose to have a set of elements $X = \{x_1, x_2, ..., x_n\}$. Each element is characterised by a set of features. The features characterising a particular element $x_i$ can belong to one of $q$ different categories. Each different category $f_q$…
1
2