Task 2: Loading the Data¶
To load the data you can use the
pandas.read_csv() function. (
- In these data sets the seperator for the data fields is not a comma, but multiple whitespaces.
You can use the regular expression
"\s+"to express this in python.
- Note the parameter
read_csv()-function which can come in extremely handy.
- Note that the data set as provided has no header.
As noted previously, the downloaded data is compressed in a
You could decompress it before working with it
(especially useful if you want to inspect the data beforehand with a plain text editor or other tool/programs),
read_csv()-function itself however can handle a such an archive just fine.
- Consider first what the loaded data should look like
- Load the data set using the
read_csv()-function from pandas. combine the year, month, day and hour columns into one single column for the timestamp.
- Set the timestamp to be the index of your dataframe
- Display the loaded data, compare the result with your expectations
- Do a plausability check:
- Check the number of rows and columns
- Check if the data inside the rows is displayed correctly (i.e. no columns got joined or torn apart), especially the date column
- Assign a proper header based on the information from the data documentation
Hints for Solving the Task
If you are seriously stuck, you can take a look at the solution hints.