Task 1: Getting the Data¶
The NOAA provides open weather data from stations around the world. It has a list of all weather stations and their respective codes among other info. The main part is the weather data archive. It is sorted by year and then station code from the station list.
The station list in itself already makes for an interesting data set to explore, in case you are looking for practise opportunities later on.
- Pick and download a sample data set from the archive. If you are not sure, New York, Central Park (Station code 725060) in 2020 would be a good starting point.
- The data is provided in a compressed
gz-archive, which can be extracted by most regular archive tools. Extract the data set and visually inspect it with a text editor to make sure it does not only contain a few rows of data.
- Get acquainted with the ISD Lite data format, it holds valuable information how to interpret what you have in front of you.
Step 2 is not strictly necessary, pandas can handle this kind of archive out of the box. It is however a good idea when starting out to get a first impression how the data set looks and if it has a lot of missing / repeating values that might indicate a reduced usefulness.