Is my understanding correct and is it recommended to have such a reward structure for my use case ?
Your understanding is not correct, and setting extremely high rewards for the goal state in this case can backfire.
Probably the most important way it could backfire in your case, is that your scaling of bad results becomes irrelevant. The difference between 0 and -50 is not significant compared to the +1000 result. In turn that means the agent will not really care by how much it fails when it does, except as a matter of fine tuning once it is already close to an optimal solution.
If the environment is stochastic, then the agent will prioritise a small chance of being at the target temperatures, over a large chance of ending up at an extreme bad temperature.
If you are using a discount factor, $\gamma$, then the agent will prioritise being at the target temperatures immediately, maybe overshooting and ending up with an unwanted temperature within a few timesteps.
Working in your favour, your environment is one where the goal is some managed stability, like the "cartpole" environment, with a negative feedback loop (the correction to the measured quantities is always to force in the opposite direction). Agents for these are often quite robust to changes in hyperparameters, so you may still find your agent learns successfully.
However, I would advise sticking with a simple and relatively small scale for the reward function. Experimenting with it, after you are certain that it expresses your goals for the agent, is unlikely to lead to better solutions. Instead you should focus your efforts on how the agent is performing, and what changes you can make to the learning algorithm.
What I would do (without knowing more about your environment):
Reward +1 per time step when temperature is in acceptable range
Reward between -0.1 * temperature difference per time step when temperature is outside acceptable range. It doesn't really matter if you measure that in Fahrenheit or Celsius.
No discounting (set discount factor $\gamma =1$ if you are using a formula that includes discounting)
The maximum total reward possible is then +36, and you probably don't expect a worse episode than around -100 or so. This will plot neatly on a graph and be easy to interpret (every unit below 36 is roughly equivalent to performance of an agent spending 15 mins per day just outside acceptable temperatures). More importantly, these lower numbers should not cause massive error values whilst the agent is learning, which will help when training a neural network to predict future reward.
As an aside (as you didn't ask), if you are using a value-based method, like DQN, then you will need to include the current timestep (or timesteps remaining) in the state features. That is because the total remaining reward - as represented by action value Q - depends on the remaining time that the agent has to act. It also doesn't matter to the agent what happens after the last time step, so it is OK for it to choose actions just before then that would make the system go outside acceptable temperatures at that point.