1

Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers?

1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997): https://papers.nips.cc/paper/1217-clustering-sequences-with-hidden-markov-models.pdf

The paper gives a probabilistic model-based approach to clustering sequences (or time series), using hidden Markov models (HMM).

2) 'Visual Cluster Exploration of Web Clickstream Data' by Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma (2012): http://www.cs.tufts.edu/comp/250VIS/papers/VAST2012-ClickStream.pdf

The paper is quite relevant to 1), as it maps each high-dimensional sequences (each sequence may have different length or in other words different dimension) to a 2D map (self organizing maps) where the distance metric is no longer Euclidean distance as that present in the conventional Kohonen Self-Organizing Maps, but instead the distance metrics becomes the log likelihood of how each sequence fits in a candidate hidden Markov model (HMM). Then on the 2D self-organizing map, K-means is used to cluster the map's nodes.

I haven't found existing package or code that implements the above clustering algorithms. There's hmmlearn (https://github.com/hmmlearn/hmmlearn) Python package to fit sequences to HMM, and there's existing Python package to implement SOM (self-organizing maps, such as this one https://github.com/stephantul/somber), but I wonder if there's existing code to implement clustering algorithms for sequential data points, based on updating the distance metric of HMM's likelihood function (or negative log likelihood, etc)? It can be Python, R, Java, Matlab, or Scala, or any other languages.

Thanks!

mflowww
  • 111
  • 3
  • From https://www.linkedin.com/pulse/fast-powerful-method-click-chain-visualization-analysis-kulikov/ The second paper implementation is not open source (must belong to Ebay wrt the authors affiliations) – Eskapp Oct 11 '18 at 16:44
  • Thanks @Eskapp ! The linkedIn post is helpful. Looks like the LinkedIn post implemented some clustering and visualization using the R package seqHMM. I just read their documentation where it talks about how they implemented mixture HMM where in section 2.3 (Clustering by mixture hidden Markov models), looks like their implementation is based on the 1990 paper: van de Pol F, Langeheine R (1990). “Mixed Markov Latent Class Models.” Sociological Methodology, 20, 213–247. doi:10.2307/271087.? – mflowww Oct 11 '18 at 18:28
  • @Eskapp I couldn't even find the paper's free version online... not sure if this paper proposes the mixture HMM for sequence clustering? Other than that paper, it's hard for me to find papers about mixture HMM... just wondering if you know any of the papers that I should look at regarding this topic? Or if you're familiar with this R package "seqHMM"? Thanks a lot for your help! – mflowww Oct 11 '18 at 18:28
  • The tutorials by Rabiner and Juang are probably the best known papers about HMMs with and without mixture models, they are the researchers who popularized their use for speech processing. Here is one of these tutorials: http://ai.stanford.edu/~pabbeel/depth_qual/Rabiner_Juang_hmms.pdf – Eskapp Oct 11 '18 at 19:08
  • Thanks @Eskapp Iin this tutorial I couldn't find the part about mixture HMMs, as it focuses on the three problems of HMMs (without mixture) and the algorithms to solve the three problems. Do you have any recommendation for tutorials or papers about HMMs with mixture structure to look at? – mflowww Oct 11 '18 at 19:48
  • There is a section that generalizes `B` to mixture models at the end but it is true that most of the equations are derived without mixtures. Check out the tutorial by Jeff Bilmes about the EM algorithm and in mixtures and HMMs. http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf I do not know of any other tutorial that go into so much details about the topic. – Eskapp Oct 11 '18 at 19:57
  • This tutorial on EM looks nice. Yep eventually we can derive it on our own and code it out. Thanks @Eskapp! – mflowww Oct 11 '18 at 20:38

0 Answers0