I've been looking into the Cox Regression method for Survival Analysis in Churn Prediction. Cox regression will allow us to determine the probability that a subscriber will unsubscribe after a time $t$, defined by the hazard rate:
$$ h(t \lvert X_i ) = h_0(t)exp\big( \boldsymbol{\beta} ^T\boldsymbol{X}_{i} \big) $$
Where
$h_0(t)$: Baseline Hazard is a prior Probability that any customer churns at time t when all influencing factors are 0.
$\boldsymbol{\beta} \in \mathbb{R}^D$: Exponent of each Coefficient gives us a Hazard ratio. These should be constant w.r.t time (proportionality assumption).
$\boldsymbol{X}\in \mathbb{R}^{N\times D}$: Set of $N$ sample customers
Problem: Proportionality Hazard Assumption: Cox regression makes an assumption that the Hazard Ratios should remain constant through time $t$. For example, for a covariate $X_1$ = "gender", say $\beta_1=1.8$. In english, it means male subscribers tend to leave the service $80\%$ more than females after a time $t$. However, this $80\%$ should hold for any time $t$.
This is usually an unreasonable constrain for many variables. But there are other methods that can incorporate variables that don’t follow the proportional hazards assumption.
- stratified cox regression
- pseudo-observations
- cox regression with time-dependent covariates
I was just reading up on stratified cox regression. The only apparent downside here is:
- The variables that are stratified need to be converted into categorical variables
- The stratified categorical variables should not have too many degrees of freedom. This will lead to a LARGE number of models whose parameters need to be estimated.
Question: Is pseudo-observations similar? Does it have less/more rigid constraints? Even so, how is it's performance considering I have copious amounts of data?