The following packages and the function will be required or may come in handy:
library(readr)
library(dplyr)
library(lubridate)
The following exercises 1-4 will be based on avocado.RDS data from Kaggle https://www.kaggle.com/neuromusic/avocado-prices. Variables are self explanatory however it is expected to do checks on the type of the data and use the suitable transformations if necessary.
Date | AveragePrice | Total Volume | region |
---|---|---|---|
2015-12-27 | 1.33 | 64236.62 | Albany |
2015-12-20 | 1.35 | 54876.98 | Albany |
2015-12-13 | 0.93 | 118220.22 | Albany |
2015-12-06 | 1.08 | 78992.15 | Albany |
2015-11-29 | 1.28 | 51039.60 | Albany |
2015-11-22 | 1.26 | 55979.78 | Albany |
Date
variable. Convert it to
date format using the appropriate function.The aim of this part (Questions 2- 4) is to be able to use format() and each time you convert the date in a different form, you need to read it in using as.Date(). Notice the structure of the variable changes to character when you use format().
As you have noticed the date is in YYYY-MM-DD format. Convert
string format to DD/MM/YYYY. Hint: format()
function will
come in handy.
Convert string format to DD.MM.YYYY. Hint: format()
function will come in handy.
Convert string format to DD/MM/YYYY. Hint: format()
function will come in handy.
The following exercises 5-8 will be based on Pollution.csv data from Kaggle https://www.kaggle.com/nicapotato/pollution-in-atchison-village-richmond-ca/data. Variables are self explanatory however it is expected to do checks on the type of the data and use the suitable transformations if necessary.
Here is a quick look of the Pollution
data:
Date | Benzene | CS2 | Ozone | SO2 | Toluene | Xylene | Wind Direction | Wind Speed | Wind Origin |
---|---|---|---|---|---|---|---|---|---|
10/10/15 3:15 | 2.5 | 2.5 | 2.5 | 2.5 | 2.50 | 220.61 | 162 | 6 | SSE |
10/10/15 2:00 | 2.5 | 2.5 | 2.5 | 2.5 | 2.50 | 184.78 | 158 | 7 | SSE |
10/10/15 4:30 | 2.5 | 2.5 | 2.5 | 2.5 | 573.36 | 144.61 | 166 | 5 | SSE |
10/10/15 1:50 | 2.5 | 2.5 | 2.5 | 2.5 | 537.12 | 125.71 | 154 | 8 | SSE |
10/10/15 4:20 | 2.5 | 2.5 | 2.5 | 2.5 | 424.89 | 105.50 | 166 | 5 | SSE |
10/10/15 1:55 | 2.5 | 2.5 | 2.5 | 2.5 | 300.48 | 79.47 | 153 | 9 | SSE |
Date
variable. Convert it to
date format using the appropriate function. After the conversion it
should look like this:## [1] "2015-10-10 03:15:00 UTC" "2015-10-10 02:00:00 UTC"
## [3] "2015-10-10 04:30:00 UTC" "2015-10-10 01:50:00 UTC"
## [5] "2015-10-10 04:20:00 UTC" "2015-10-10 01:55:00 UTC"
Create year, month, day, hour and minute columns with using
mutate()
. Create a new column with combining year, month
and day.
Create a new column with Today’s date and name it
newtime
. Add 2 years to the Newtime
variable.
Check the difference between Date
and newtime
variable in weeks, round it to 2. Check the duration of the
year
variable using appropriate duration
function.
Create a sequence of time starting from 6:00 am 1st of May 2015 ending 7:00 am 1st of October 2015 by hour. Check the length of this sequence. Create a subset of pollution data with the same length of the sequence and bind them.
If you have finished the above tasks, work through the weekly list of tasks posted on the Canvas announcement page.