I'd like to calculate the mutual information between two datasets, but I'd prefer not to cluster them first.
I'm thinking of using SciKit-Learn's mutual_info_score metric, but it's documentation suggests the inputs should be clusters, not whole datasets. My intuition is that clustering is necessary because calculating the complete mutual information score between large datasets is computationally expensive.
The datasets I'm trying to compare are large, 400,000 rows by 180 columns. Do I have to use clustering on these datasets?