4

My app receives messages with a random number of bits at a random time. But two weeks ago I started to notice some almost regular patterns on the metrics of my app. I suspect they are some bots sending artificially generated data to my app. Specifically, I'm looking for sequential subsets of messages in a time series where messages has almost the same number of bits.

I read about some methods but they use data where time is not a random variable. I appreciate any help you can provide, including books, web pages, tutorials (in Python if possible), etc.

Jocer
  • 143
  • 5
  • I was looking for a solution and I found in the book [Bayesian Methods for Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) an example "Inferring Behavior from Text-Message Data". Maybe what I need to find the **switchpoint** in the time series. Like in this [question in stackoverflow](http://stackoverflow.com/questions/35922022/pymc3-select-data-within-model-for-switchpoint-analysis). What do you people think? Is there another method? – Jocer May 24 '16 at 21:42
  • Welcome to Datascience.SE! It's not so much a [change detection](https://en.wikipedia.org/wiki/Change_detection) problem as an [anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection) problem. [Here](https://vimeo.com/89644371) is a presentation. – Emre May 26 '16 at 07:05

1 Answers1

0

As a first step, to segregate the messages that appear to be a bot, you could first try binning by message size. For example, if messages sent by bots are likely to be around 128 bytes to 140 bytes, assign these to a unique bin.

Next, create a time series based on this bin. Try to decompose the time series using an additive or multiplicative method such as Holt Winters. A strong seasonal component would help you identify regular and repetitive messages which are being generated automatically.

Sandeep S. Sandhu
  • 2,487
  • 15
  • 20