2

I have a set of time series data (just like voice sequence data) with the pattern as shown in the first figure (theoretical data). The measured data is given as presented in the second figure. What I want to do is localizing/finding the subsequent pattern as shown in the red squares. Is there any algorithm to solve this problem? It looks like the classification/regression problem in machine learning, but I have no idea how to start it.

enter image description here

enter image description here

Shayan Shafiq
  • 1,012
  • 4
  • 11
  • 24
Land
  • 23
  • 1
  • 4
  • Did you look into independent component analysis? – Peter Oct 16 '20 at 19:20
  • Not yet, is there any recommendation of literature? Thanks – Land Oct 16 '20 at 19:28
  • ESL, Ch. 14.7 https://web.stanford.edu/~hastie/Papers/ESLII.pdf – Peter Oct 16 '20 at 19:39
  • But on a second thought it might be a good problem for LSTM neural net: https://jjallaire.github.io/deep-learning-with-r-notebooks/notebooks/6.3-advanced-usage-of-recurrent-neural-networks.nb.html – Peter Oct 16 '20 at 19:40
  • Thanks, Peter! I will check them both. I have some basic idea about LSTM for prediction problem. Towards this problem, how do you consider them? Classification or regression? – Land Oct 16 '20 at 19:44
  • is this purely ex post or with a predictive element in it. If ex post what stops you looking at the derivative/slope of the curve – Peter Oct 16 '20 at 19:46
  • Postprocessing. I have looked into slope, but the measureed data is not as perfect as the theoratical figure above. There are some small fluctuations. So pure derivative doesn't always work well. – Land Oct 16 '20 at 19:51
  • what about something like $x_t - x_{t-1}$ or so as feature in some simple classification model? – Peter Oct 16 '20 at 19:56
  • Adjacent difference is too ideal duo to fluctuations, and $x_{t+n}-x_t$ is also not satisfied, as the setting of $n$ is an adding problem without specific/general criterion. – Land Oct 16 '20 at 20:02

1 Answers1

6

The real state of the art here is the Matrix Profile suite, developed by Eamonn Keogh and his team in University of California at Riverside (UCR). Here are some links to get you started:

You'll find links to implementations in the above stuff.

Apart from the team at UCR, there is another (and possibly more efficient) implementation of the relevant algorithms in the STUMPY Python package:

desertnaut
  • 1,908
  • 2
  • 13
  • 23