I am working on a problem to estimate task completion time in kanban (project management tool). While doing EDA, I looked at tasks that are either done or cancelled. In this case, I defined the completion time as the time taken from task creation to done/cancelled.
I noticed I am running into an issue with that definition. I am disregarding tasks that have not been done yet. If we think of "task = done" as "event = 1", this is like throwing away observations with "event = 0" in survival analysis, giving us a biased result.
- How should I handle this?
- I would also like to get some inputs on how should I approach "done" vs "cancelled"?