I created a XGBRegressor model with certain encoded 'object' dtypes in the data. Now if I want to run the model with new set of data which is freshly encoded it's giving wrong predictions. How to ensure that the new dataset is encoded in the same way as was the train data? Or any other solution to this problem?
Asked
Active
Viewed 111 times
0
-
Merge them and then encode, simple ? – Aditya Feb 25 '19 at 04:22
-
So there's no other way I reckon..? – Dishant Kothia Feb 25 '19 at 05:10
-
https://datascience.stackexchange.com/q/54052/55122 – Ben Reiniger Nov 28 '19 at 03:45
2 Answers
0
You can save the encoding and use them to encode the new data. Only thing to make sure is not to have new data which was not there in saved encoding. You can save them as pickle files if you are using Python.
-
But that's not a valid assumption to make, rather have a cat Val which denotes unknown categories let's say. – Aditya Feb 25 '19 at 12:38
-
1How can you create if it is unknown, so what we do for this issue, we add a feature called "OTHERS" in trainig data and if the incoming feature is not in saved encoding considder this feature as "OTHERS" and pass it to prediction. This is working and proven, we are using this in a solution which requires 99% acuracy and this is a supervised learning solution. – Shantha Ekanayake Feb 26 '19 at 06:06
-
-
-
@ShanthaEkanayake can you show me how to save encoder? And also the saved encoder would encode in the same way as before? – Dishant Kothia Feb 28 '19 at 14:58
0
Please try this below link to get to know more about this encoding.
also, after looking at this you can understand how to save the encoding.