Task 3: Cleaning the data¶
The data as loaded is not yet ready for our work. For technical reasons, the data representation has a few peculiarities:
- According to the data documentation, the value
-9999indicates missing data.
- Some data columns have been scaled by a factor.
- A wind direction of
0means that it is undetermined, North is designated by
- Replace the value
-9999with something more appropriate, for example the constant
- Replace the measurements were no wind direction is given in a similar fashion.
- Now the value
0is free to represent North as usual. This will come in handy in a later task.
- Check for columns that have no useful data at all and remove them if convenient
- Re-scale the columns so they all use a factor of 1 (and can be read and interpreted more easily by humans)
- Check if there are entries missing for some dates/hours.
Consider first how many hours the given year should have (Account for the additional day of leap years if applicable.)
How many rows are missing in your data set? (If your data set has a significant number of rows missing, consider choosing another one.)
For this you may find the
- Add suitable placeholders for those missing rows, so the averaging works as expected.
Hints for Solving the Task
If you are seriously stuck, you can take a look at the solution hints.