Required Packages

The following packages and the function will be required or may come in handy:

library(readr)
library(dplyr)
library(lubridate)

Exercises

Avocado Prices Data

The following exercises 1-4 will be based on avocado.RDS data from Kaggle https://www.kaggle.com/neuromusic/avocado-prices. Variables are self explanatory however it is expected to do checks on the type of the data and use the suitable transformations if necessary.

Date AveragePrice Total Volume region
2015-12-27 1.33 64236.62 Albany
2015-12-20 1.35 54876.98 Albany
2015-12-13 0.93 118220.22 Albany
2015-12-06 1.08 78992.15 Albany
2015-11-29 1.28 51039.60 Albany
2015-11-22 1.26 55979.78 Albany
  1. Check the structure of the Date variable. Convert it to date format using the appropriate function.

The aim of this part (Questions 2- 4) is to be able to use format() and each time you convert the date in a different form, you need to read it in using as.Date(). Notice the structure of the variable changes to character when you use format().

  1. As you have noticed the date is in YYYY-MM-DD format. Convert string format to DD/MM/YYYY. Hint: format() function will come in handy.

  2. Convert string format to DD.MM.YYYY. Hint: format() function will come in handy.

  3. Convert string format to DD/MM/YYYY. Hint: format() function will come in handy.

Pollution Data

The following exercises 5-8 will be based on Pollution.csv data from Kaggle https://www.kaggle.com/nicapotato/pollution-in-atchison-village-richmond-ca/data. Variables are self explanatory however it is expected to do checks on the type of the data and use the suitable transformations if necessary.

Here is a quick look of the Pollution data:

Date Benzene CS2 Ozone SO2 Toluene Xylene Wind Direction Wind Speed Wind Origin
10/10/15 3:15 2.5 2.5 2.5 2.5 2.50 220.61 162 6 SSE
10/10/15 2:00 2.5 2.5 2.5 2.5 2.50 184.78 158 7 SSE
10/10/15 4:30 2.5 2.5 2.5 2.5 573.36 144.61 166 5 SSE
10/10/15 1:50 2.5 2.5 2.5 2.5 537.12 125.71 154 8 SSE
10/10/15 4:20 2.5 2.5 2.5 2.5 424.89 105.50 166 5 SSE
10/10/15 1:55 2.5 2.5 2.5 2.5 300.48 79.47 153 9 SSE
  1. Check the structure of the Date variable. Convert it to date format using the appropriate function. After the conversion it should look like this:
## [1] "2015-10-10 03:15:00 UTC" "2015-10-10 02:00:00 UTC"
## [3] "2015-10-10 04:30:00 UTC" "2015-10-10 01:50:00 UTC"
## [5] "2015-10-10 04:20:00 UTC" "2015-10-10 01:55:00 UTC"
  1. Create year, month, day, hour and minute columns with using mutate(). Create a new column with combining year, month and day.

  2. Create a new column with Today’s date and name it newtime. Add 2 years to the Newtime variable. Check the difference between Date and newtime variable in weeks, round it to 2. Check the duration of the year variable using appropriate duration function.

  3. Create a sequence of time starting from 6:00 am 1st of May 2015 ending 7:00 am 1st of October 2015 by hour. Check the length of this sequence. Create a subset of pollution data with the same length of the sequence and bind them.

Finished?

If you have finished the above tasks, work through the weekly list of tasks posted on the Canvas announcement page.

Return to Course Website