You can use the following scalings
$$x’=\dfrac{x}{255} \qquad (1)$$
$$x’=\dfrac{x-127.5}{127.5} = \dfrac{x}{127.5}-1 \qquad (2)$$
for rescaling to $[0,1]$ or $[-1,1]$.
The rescaling of inputs tries to keep the range of weights in a small range. Theoretically it is not necessary to rescale your inputs because it can be compensated by an appropriate redefinition of your weights. Practically it is important because your weights might occupy a very large range of values.
In order to understand this I will construct a toy example with two inputs $x_1= 1$ and $x_2=1$ and a simple linear regression $y_n=w_1x_{1n}+w_2x_{2n}$. Assume the true values for the weight are $w_1=1$ and $w_2=1$. Now, assume we have the same data set but with different scales for $x_1$ and $x_2$ such that $x_1=10^k$ and $x_2=10^{-k}$. In order to obtain the same inputs we would need to have inverse weights resulting in $w_1=10^{-k}$ and $w_2=10^k$. Theoretically this is not a big deal but practically we see that for large values of $k$ we will need to use variables that store numerical values between $10^{-k}$ and $10^k$.
I don’t know the reason why the authors of the cited paper used specific transformations. Both should lead to the almost same performance as the bias should be able compensate for the additional $-1$ and the reacaling could be compensated by the weights for each input. There might be differences because of the stochasticity of many optimization techniques and because the final solution is very likely not the global optimum. I would choose one over the other if I want to compare my model with a specific model from a paper. In such a case I would use the same transformation as applied in the paper.
You could also use different scales in order to better discriminate the type of input (low resolution vs high resolution) by looking at the transformed values.