1

I have a problem to train my classifier.

I have 10 different kinds of music genres, each genre with 100 songs, after making an Mfccs I have a numpy array of (1293, 20)

If all together with np.vstack I have an array of (1293000, 20) and another for the labels.

When I run model.fit (features, labels), it takes a lot of time.

I have also tried with:

from sklearn.manifold import TSNE
X_embedded = TSNE (n_components = 2).fit_transform(features)
X_embedded.shape

I've tried to reduce the songs from 1000 to 100 but it's still taking a long time.

Any idea how I can classify songs with arrays with so much data?

I put some code:

scaler = sklearn.preprocessing.StandardScaler()
y, sr = librosa.load('EXAMPLE1')
mfcc = librosa.feature.mfcc(y, sr=sr, n_mfcc=20).T
mfcc_scaled = scaler.fit_transform(mfcc)
mfcc_scaled.shape # (1293, 20)

y, sr = librosa.load('/Users/josetorronteras/AnacondaProjects/Neural-Networks/genres/pop/pop.00044.au')
mfcc2 = librosa.feature.mfcc(y, sr=sr, n_mfcc=20).T
mfcc_scaled2 = scaler.fit_transform(mfcc2)
mfcc_scaled2.shape # (1293, 20)

tmp_arr = []
tmp_arr.append(mfcc_scaled)
tmp_arr.append(mfcc_scaled2)
mafcc_list = np.vstack(tmp_arr)

mafcc_list.shape # (2586, 20)
a0 = np.zeros(len(mfcc_scaled))
a1 = np.ones(len(mfcc_scaled2))

labels = np.concatenate((a0, a1))
labels.shape # (2586,)

Thanks

  • That is little data. Use a better computer, faster software, or adjust your expectations. How long does each part take? – Emre Feb 01 '18 at 19:41
  • Hours, I do not know because it's never finishes. Is there no other way to teach him by parts or something? @emre –  Feb 01 '18 at 19:57
  • Yes, using `partial_fit` instead of `fit`. I forgot to include [the link](http://scikit-learn.org/stable/modules/scaling_strategies.html). – Emre Feb 01 '18 at 21:02
  • @Emre can you explain me how can I use partial_fit??.. I have my array_features with the data, my array_labels with the labels to identify the data, and I need to use classes. `partial_fit( X , y , classes = None , sample_weight = None ) ` And i other code i see a loop.. –  Feb 02 '18 at 16:35
  • Instead of passing all your data to partial_fit, pass a subset (x_subset, y_subset) and do multiple iterations. – Emre Feb 02 '18 at 17:49
  • @Joseew , What is your system configuration?. TSNE is known to be pretty slow on large datasets. For tSNE particularly you may use a multicore implementation shared here - https://datascience.stackexchange.com/a/28772/47414 – Nilav Baran Ghosh Apr 14 '18 at 08:10
  • What type of classification model are you trying to fit? – Brian Spiering Jul 20 '21 at 03:28

0 Answers0