2

I am trying to build an item-item similarity matching recommendation engine with mahout. The data set is as in the following format ( attributes are in text not in numerals format )

name : category : cost : ingredients

x : xx1 : 15 : xxx1, xxx2, xxx3

y : yy1 : 14 : yyy1, yyy2, yyy3

z : xx1 : 12 : xxx1, xxy1

So in-order to use this data set for mahout to train, what is the right way to convert this in to numeric (as CSV Boolean data set) format accepted by mahout.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
Sreejithc321
  • 1,890
  • 3
  • 17
  • 32
  • 1
    You'll find lots of leads if you look up "encoding categorical variables"; e.g. http://stats.stackexchange.com/questions/21770/ – Emre Nov 23 '14 at 18:27

0 Answers0