1

I have a dataset that consists of 4 values in a target variable. I have performed Ordinal Encoding over that which worked for me but my question here's that if I apply one-hot encoding can I solve this problem?. As it would be 4 new columns that are generated from a single target variable.

|classes|classes_a|classes_b|classes_c|classes_d
|a      |1        |0        |0        |0
|------ |---------|---------|---------|---------
|b      |0        |1        |0        |0
|------ |---------|---------|---------|---------
|c      |0        |0        |1        |0
|-------|---------|---------|---------|---------
|d      |0        |0        |0        |1

Now I have these 4 columns classes_a,classes_b,classes_c, and classes_d. How can I deal with its requirement?

Shayan Shafiq
  • 1,012
  • 4
  • 11
  • 24
Adnan Khan
  • 11
  • 2
  • 1
    Unclear what you want to do: Train a model on the four target columns? – Peter Nov 15 '21 at 11:09
  • well, I have 24 columns one of which is the target column and that target column contains 4 classes like given in the table above. can I perform one-hot encoding over the target column? The actual question is this: Would it still be possible to train the KNN model if you one-hot encoded the response data? – Adnan Khan Nov 15 '21 at 15:25

1 Answers1

1

As pointed out in the comments, the actual question is:

Would it still be possible to train the KNN model if you one-hot encoded the response data?

The answer is yes:

In case you have one target (one column) with four classes, you have a multiclass setting.

In case you have four targets (four columns) with binary class (1, 0), you have a multilabel setting.

See sklearn's overview of different approaches.

With Keras you can use the "functional API" to model a mult-label (multi-output) case using neural nets. You would write the model like this:

# Model
...

# Outputs
out1 = Dense(1)(x)
out2 = Dense(1)(x)

# Compile/fit the model
model = Model(inputs=Input_1, outputs=[out1,out2])
model.compile(optimizer = ..., loss = ...)
# Add actual data here in the fit statement
model.fit(train_data, [train_targets,train_targets2], epochs=..., batch_size=..., validation_split=0.2)

Here is a regression example of the functional API, which can be easily changed to classification.

However, the intuitive way to solve a problem like yours is to simply do multiclass-classification. I don't see a benefit in rearranging the target as "one hot".

Peter
  • 7,277
  • 5
  • 18
  • 47