If I have a training data set and I train a Naive Bayes Classifier on it and I have an attribute value which has probability zero. How do I handle this if I later want to predict the classification on new data? The problem is, if there is a zero in the calculation the whole product becomes zero, no matter how many other values I got which maybe would find another solution.
Example:
$P(x|spam=yes) = P(TimeZone = US | spam=yes) \cdot P(GeoLocation = EU | spam = yes) \cdot ~ ... ~ = 0.004 $
$P(x|spam=no) = P(TimeZone = US | spam=no) \cdot P(GeoLocation = EU | spam = no) \cdot ~ ... ~ = 0 $
The whole product becomes $0$ because in the training data the attribute TimeZone US is always Yes in our small training data set. How can I handle this? Should I use a bigger set of training data or is there another possibility to overcome this problem?