1

I have read this article on towardsdatascience and they teach how to cluster time series using the DTW distance and the TimeSeriesKMeans from the tslearn.clustering library. I also read the official documentation and I found a note.

Notes

If metric is set to “euclidean”, the algorithm expects a dataset of equal-sized time series.

This suggest me that for other metrics (like dtw for example) the method works with different sized time series.

I'm currently working on time-series data and I want to check if I can get some interesting information about my data using this method.

This is how I constructed my curves. I have a dataframe called "relevant_figures" that it contains the relevant information in order to construct the curves. Then I proceed as follows:

X = []

for _,row in relevant_figures.iterrows():
    input_time = row['InputTime']
    output_time = row['OutputTime']

    ts = weights_df.loc[input_time : output_time]['weight'].copy()
    X.append(ts)

When I try the method

TimeSeriesKMeans(n_clusters=3, metric="dtw").fit(X)

It throws a ValueError

Name: peso, Length: 120, dtype: float64]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

However I can't reshape in order to construct an array because every ts has different lengths. So reshaping does not work. What should I do? Thanks in advance

Román
  • 13
  • 3

1 Answers1

1

Try using the to_time_series_dataset function in the tslearn.utils module. This takes a list of lists as input and returns the data formatted as a numpy array, e.g.:

from tslearn.utils import to_time_series_dataset
X = to_time_series_dataset([[1, 2, 3, 4], [1, 2, 3], [2, 5, 6, 7, 8, 9]])

It looks like it pads the shorter time series with nan's to fit them into the array.

Lynn
  • 1,121
  • 1
  • 3
  • 18