How to ensure same encoding pattern?

Question

I created a XGBRegressor model with certain encoded 'object' dtypes in the data. Now if I want to run the model with new set of data which is freshly encoded it's giving wrong predictions. How to ensure that the new dataset is encoded in the same way as was the train data? Or any other solution to this problem?

Merge them and then encode, simple ? – Aditya Feb 25 '19 at 04:22 — Aditya, Feb 25 '19 at 04:22
So there's no other way I reckon..? – Dishant Kothia Feb 25 '19 at 05:10 — Dishant Kothia, Feb 25 '19 at 05:10
https://datascience.stackexchange.com/q/54052/55122 – Ben Reiniger Nov 28 '19 at 03:45 — Ben Reiniger, Nov 28 '19 at 03:45

score 0 · Answer 1 · answered Feb 25 '19 at 06:37

0

You can save the encoding and use them to encode the new data. Only thing to make sure is not to have new data which was not there in saved encoding. You can save them as pickle files if you are using Python.

answered Feb 25 '19 at 06:37

Shantha Ekanayake

1
1

But that's not a valid assumption to make, rather have a cat Val which denotes unknown categories let's say. – Aditya Feb 25 '19 at 12:38
1

How can you create if it is unknown, so what we do for this issue, we add a feature called "OTHERS" in trainig data and if the incoming feature is not in saved encoding considder this feature as "OTHERS" and pass it to prediction. This is working and proven, we are using this in a solution which requires 99% acuracy and this is a supervised learning solution. – Shantha Ekanayake Feb 26 '19 at 06:06
Yep exactly what I meant! – Aditya Feb 26 '19 at 06:12
So in that case my assumption is correct right ? – Shantha Ekanayake Feb 26 '19 at 06:57
@ShanthaEkanayake can you show me how to save encoder? And also the saved encoder would encode in the same way as before? – Dishant Kothia Feb 28 '19 at 14:58

score 0 · Answer 2 · answered Mar 02 '19 at 17:01

Please try this below link to get to know more about this encoding.

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#sklearn.preprocessing.OrdinalEncoder

also, after looking at this you can understand how to save the encoding.

How to ensure same encoding pattern?

2 Answers2