ReLU is an activation function defined as $h = \max(0, a)$ where $a = Wx + b$.
Normally, we train neural networks with first-order methods such as SGD, Adam, RMSprop, Adadelta, or Adagrad. Backpropagation in first-order methods requires first-order derivative. Hence $x$ is derived to $1$.
But if we use second-order methods, would ReLU's derivative be $0$? Because $x$ is derived to $1$ and is derived again to $0$. Would it be an error? For example, with Newton's method, you'll be dividing by $0$. (I don't really understand Hessian-free optimization, yet. IIRC, it's a matter of using an approximate Hessian instead of the real one).
What is the effect of this $h''=0$? Can we still train the neural network with ReLU with second-order methods? Or would it be non-trainable/error (nan/infinity)?
For clarity, this is ReLU as $f(x)$:
$f(x) =$ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array}
$f'(x) =$ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array}
$f''(x) = 0$