7

I have a dataset of transactional data with customer ID and I want to segment the dataset into groups using cluster analysis. I'm interested in following the evolution of each cluster over time, but since customers have very different behaviours (roughly 50% of the time a customer will change cluster the week after), I was wondering what would be a statistically sound approach. Is it a good idea to train a clustering algorithm every week and look backwards at the weekly evolution of each segment?

Egodym
  • 182
  • 5
  • Which clustering technique are you using to generate the clusters? – pseudoabdul Jan 12 '22 at 02:42
  • I used k-means but my question was not just limited to that. – Egodym Jan 12 '22 at 03:04
  • k-means will work because the number of clusters is fixed. One option is to run k-means every week over the data. You will have generated time series which you can then analyze. For example, you could plot the size of your largest cluster over time. – pseudoabdul Jan 12 '22 at 06:40

4 Answers4

1

You can try

  1. Dynamic mode decomposition.
  2. Dynamic Time Warping. Found a nice resource on Towards data science blog.

These two have proven better approaches than PCA for time series clustering.

Happy coding

Arpit-Gole
  • 111
  • 3
0

May be what you were looking for is the Rand index ?

This "is a measure of the similarity between two data clusterings", in other words, if the RI is close to 1 (after repeated clustering over a time window) then your segment are stable.

aRedDish
  • 31
  • 5
-1

Cluster once.

Study the clusters and refine them to define classes.

Then classify points to these classes.

Has QUIT--Anony-Mousse
  • 7,969
  • 1
  • 14
  • 30
  • Thanks. Any reference to dive deeper? My concern is whether clustering once 2 years of monthly data would yield different results than clustering each month separately and then looking at the results. – Egodym May 14 '20 at 22:28
-1

Run Clustering periodically (say every month). Use the elbow method to make a decision on the best number of clusters (be open to this aspect of the system changing over time). Define / Label what each cluster represents - The centroids of each cluster represents the average behavior of the inmates within the cluster.

Jayaram Iyer
  • 785
  • 5
  • 8