2

I am working on a problem to estimate task completion time in kanban (project management tool). While doing EDA, I looked at tasks that are either done or cancelled. In this case, I defined the completion time as the time taken from task creation to done/cancelled.

I noticed I am running into an issue with that definition. I am disregarding tasks that have not been done yet. If we think of "task = done" as "event = 1", this is like throwing away observations with "event = 0" in survival analysis, giving us a biased result.

  • How should I handle this?
  • I would also like to get some inputs on how should I approach "done" vs "cancelled"?
Sharath
  • 121
  • 1

1 Answers1

1

It's a matter of defining exactly which problem you want to solve, and there might be many variants:

  • If the goal is really to estimate "time completion", then imho you should use only completed tasks, since the other tasks haven't been "completed". Note that in this case you're counting time actually spent on the task.
  • If the goal is to estimate "time of solving the task", whether by completing it or cancelling it, then you're counting the duration between the time the task was initialized and the time it was either completed or cancelled. Note that in this case the duration may include time spent on other tasks.

In both cases above, I don't see any proper way to include tasks which are still pending. My idea for these cases would be to calculate a different statistic, something like "rate of completed tasks after X days" for instance.

Erwan
  • 24,823
  • 3
  • 13
  • 34