I'm having some thoughts on whether I should remove the outliers. I'm trying to find the tags that are commonly used together. Imagine that I have the following dataset. The first column is the Tag_ID and the second column is the Number of People that used that Tag.
1 3472034
2 1277918
3 1249839
4 1010770
5 915099
6 898292
7 636792
8 604352
9 555673
10 298495
11 291511
12 211074
13 200868
...
(This was copied from my actual dataset).
My question is: Should I remove a Tag instance when it is much more frequent than the other? Is that regarded as a good practice?
Many thanks!