Anomaly Detection for Large Time Series Data

Question

I am working on detecting anomalies within a large time series data set. It is updated on a regular basis and consists of more than 30 parameters. I am using R as a reference language.

It is a first for me working on this type of projects and I am unfamiliar with most of the techniques. I have 6 weeks to implement a good analytical toolbox to enhance the quality of the control checks on the production line.

I have found a couple of potential methods to analyze it including statistical machine learning, deep learning using auto-encoded neural networks or clustering approaches. The purpose of the chosen method is to detect the anomalies/outliers by itself. It doesn't really need to be real-time analysis. What approach would you recommend to implement for the scope of the project, given the structure of the data?

Try Dilini Talagala's packages: https://github.com/pridiltal/oddstream and https://github.com/pridiltal/stray — Rob Hyndman, Jun 20 '18 at 19:24

score 0 · Accepted Answer · answered Jun 20 '18 at 19:56

Following J.Tukey, you should plot, draw graphs, visualize, etc... until you have a solid pack of examples.

Then make Tukey' fences on each of the 30 parameters. Let $q_1$ and $q_3$ be the 1st and 3rd quartiles, $d=q_3-q_1$ the inter-quartile distance, and define as outlier any observation outside the interval $q_1-k\cdot d < x < q_3+k\cdot d$, where $k$ is a constant. Traditionally, $k=1.5$ indicates an outlier and $k=3$ indicates the data is far out. However, the real value of $k$ should be tested against your examples.

Then make a cluster analysis (with a k-nearest neighbor) and define as outlier any point isolated in one cluster. Again, use your example to test various values of $k$.

Anomaly Detection for Large Time Series Data

1 Answers1