I want to find whether there is a particular hour when the author should publish the documents online to get maximum views/reads.
Data looks something like this -
| Document ID | Published Date | Read Date |
|---|---|---|
| 1 | 1/1/2022 10:00:00 | 1/1/2022 11:00:00 |
| 1 | 1/1/2022 10:00:00 | 1/1/2022 12:00:00 |
| 1 | 1/1/2022 10:00:00 | 1/1/2022 11:05:00 |
| 1 | 1/1/2022 10:00:00 | 1/1/2022 15:05:00 |
| 2 | 1/2/2022 11:00:00 | 1/2/2022 19:05:00 |
| 2 | 1/2/2022 11:00:00 | 1/3/2022 20:05:00 |
| 4 | 1/3/2022 23:04:00 | 1/4/2022 11:05:00 |
| 4 | 1/3/2022 23:04:00 | 1/4/2022 23:11:00 |
Note - The documents with same document id have the same published date.
Read date represents the active date and time of the users, and published date represents the date and time of the authors when they mostly publish the documents.
Since, both the columns are dates, How should I use the correlation and time series analysis to get to a conclusion. Should I make the data stationery first? There is a pattern in the read date as in more readers are active in the morning and afternoons as compared to night.