Survival analysis to estimate kanban tasks completion times

Question

I am working on a problem to estimate task completion time in kanban (project management tool). While doing EDA, I looked at tasks that are either done or cancelled. In this case, I defined the completion time as the time taken from task creation to done/cancelled.

I noticed I am running into an issue with that definition. I am disregarding tasks that have not been done yet. If we think of "task = done" as "event = 1", this is like throwing away observations with "event = 0" in survival analysis, giving us a biased result.

How should I handle this?
I would also like to get some inputs on how should I approach "done" vs "cancelled"?

IMO you should leave out cancelled tasks as they mean completely different thing than done — Nikos M., Mar 03 '21 at 16:56
Tasks not done yet can be taken into account (as not-completed) with their current elapsed time — Nikos M., Mar 03 '21 at 16:57

score 1 · Answer 1 · answered Mar 06 '21 at 00:24

It's a matter of defining exactly which problem you want to solve, and there might be many variants:

If the goal is really to estimate "time completion", then imho you should use only completed tasks, since the other tasks haven't been "completed". Note that in this case you're counting time actually spent on the task.
If the goal is to estimate "time of solving the task", whether by completing it or cancelling it, then you're counting the duration between the time the task was initialized and the time it was either completed or cancelled. Note that in this case the duration may include time spent on other tasks.

In both cases above, I don't see any proper way to include tasks which are still pending. My idea for these cases would be to calculate a different statistic, something like "rate of completed tasks after X days" for instance.

Survival analysis to estimate kanban tasks completion times

1 Answers1