What is the fastest way to detect lag and calculate cross correlation of two binary time series?

Question

Example,

arr1 = array([0,0,0,1,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,1,1,1,1,1,0,0])

arr2 = array([1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0])

arr2 is almost perfectly correlated with arr1 lagged by 3 time slots.

I'm assuming every time series is of the same length with only binary values.

To get the correlation I've tried swapping the 0-valued entries with -1 and calculating the dot product between the two arrays and then dividing by the length of the time series. I don't know how to find the lag other than to recalculate the correlation using different time shifts, which seems computationally burdensome.

score 1 · Answer 1 · answered Nov 04 '22 at 03:11

There are specific models used to work with binary time series data, you can read about it in research papers, a lot of them refer to econometrics domains, but some are more general.

But the gist is you use a gbARMA is a generalized binary AutoRegressive Moving Average model and it should be able to describe a binary model well enough.

Here is a link to a pretty mathy explanation of the model and how it translates into a larger class of categorical time series.

What is the fastest way to detect lag and calculate cross correlation of two binary time series?

1 Answers1