Deciphering NCDC data

Question

I need US monthly climate data for the years 1992-2012. It would be great to get down to a county level, but by state would be just fine. Every site that I go to inevitably kicks me to the NCDC, but I cannot make sense of their data.

For example: the .csv sample data for GHCN Monthly Summaries lists EMXT (extreme maximum temperature) for each month of 2010 in Petersburg, ND. July had an EMXT of 317. I've been through the documentation, but I can't figure out what that number is supposed to mean. I know it wasn't 317F or C in ND at any point. Did they add all the temps up? Was it around 10C every day of July 2010? But why would you do that? The .PDF data looks like actual temperatures, but I need a lot of data: .CSV is ideal; .PDF is really not useable for the amount of data I am going to manipulate.

What am I missing? Or is there another way to get this data?

score 4 · Accepted Answer · answered Apr 05 '15 at 08:54

4

The documentation linked from the datasets page states:

Air Temperature (all units in Fahrenheit on PDF monthly form and tenths of degrees Celsius on CSV or text)
EMNT - Extreme minimum temperature *
EMXT - Extreme maximum temperature *

The Petersburg data looks plausible under this interpretation (EMXT −3.9°C to 33.9°C over the year).

answered Apr 05 '15 at 08:54

Pont

5,429
2
27
43

Well, that's embarrassing. I'm going to blame it on the brain-melting drugs I'm on for a dislocated vertebra. Do you have any idea why it would be in tenths of degrees C? Why not whole degrees? – Alanna Apr 05 '15 at 20:20
@Alanna We've all been there :). I wasted a good hour yesterday fruitlessly wrestling a data format because I somehow failed to see a vital bit of the extensive documentation. As to why they use that unusual unit, I have no idea. I think a tenth of a degree Celsius is around the accuracy you can expect from a typical weather station, but that doesn't explain why they don't just report it in degrees with one decimal place. – Pont Apr 05 '15 at 22:18
I've worked with one or two days sets that use tenths of a degree. Maybe it was felt that this would help prevent errors handling floats when the CSV was imported? Just a guess. – jimjamslam Apr 05 '15 at 23:04
@Alanna: regarding the usage of tenths of a degree, it may be a continuation of an ancient data format from the early era of computers. Removing decimal points changes numbers to integers. In old computers memory capacity & calculating speed was an issue. In binary digits, integers are half the size of floating point numbers & a quarter the size of double precision numbers. Hence the use of integers speeds up calculations & requires less memory (disk drives, tape & RAM) than floating point numbers. In the old days, tape was used heavily as an external storage medium. – Fred Apr 06 '15 at 02:35
@Fred My advisor made his PhD student learn Fortran because so much climate data is still in that language. There are a number of reasons for this. I can see why, then, not having decimals, as you say, would make sense. Thank you so much! – Alanna Apr 06 '15 at 06:33
@rensa I have had issues with floats in Tableau...I multiplied everything by 0.1; I'm going to try the original spreadsheet & see if Tableau can digest it more easily. – Alanna Apr 06 '15 at 06:37
I'm not familiar with Tableau, being primarily an R user myself, but I'll be interested to see how you go! @Fred's reasoning rings truer to me in hindsight; unless you had a High Performance Computing application, I would think it'd be much of a muchness on current hardware. – jimjamslam Apr 06 '15 at 06:44
@rensa you are correct about current computer hardware. There is logic in maintaining the old data format for consistency. If the data format was changed then all the old data would need to be changed & checked to ensure there were no errors. If the format was changed as of a certain date but the old data remained in the old format, people would need to be mindful of the change & when it occurred. Analysis apps would need to first check if the data was from before or after the date of change, condition the data accordingly before doing the calculations. It's easier to keep the old format. – Fred Apr 06 '15 at 09:17
One benefit not mentioned about scaling a floating point value and storing at as integer is that it can be exactly represented (to the selected precision). IEEE floating point will always round decimals to the nearest representable value and this can be a source of error. This is the case even on modern hardware. Also note that Fortran is a language, not a data format and it handles floating point values natively. – casey Apr 06 '15 at 19:46

Deciphering NCDC data

1 Answers1

Linked

Related