Great question, @gsamaras! The way you've set up this experiment makes a lot of sense to me, from a design point of view, but I think there are a couple aspects you can still examine.
First, it's possible that uninformative features are distracting your classifier, leading to poorer results. In text analytics, we often talk about stop word filtering, which is just the process of removing such text (e.g., the, and, or, etc.). There are standard stop word lists you can easily find online (e.g., this one), but they can sometimes be heavy-handed. The best approach is to build a table relating feature frequency to class, as this will get at domain-specific features that you won't likely find in such look-up tables. There is varying evidence as to the efficacy of stop word removal in the literature, but I think these findings mostly have to do with classifier-specific (for example, support vector machines tend to be less affected by uninformative features than does a naive bayes classifier. I suspect k-means falls into the latter category).
Second, you might consider a different feature modeling approach, rather than tf-idf. Nothing against tf-idf--it works fine for many problems--but I like to start with binary feature modeling, unless I have experimental evidence showing a more complex approach leads to better results. That said, it's possible that k-means could respond strangely to the switch from a floating-point feature space to a binary one. It's certainly an easily-testable hypothesis!
Finally, you might look at the expected class distribution in your data set. Are all classes equally likely? If not, you may get better results from either a sampling approach, or using a different distance metric. k-means is known to respond poorly in skewed class situations, so this is something to consider as well! There is probably research available in your specific domain describing how others have handled this situation.