Module 3 Demonstration
Understand: Data Types and Data Structures
1 / 50

Module 3: Outline

Data types in general
- Qualitative (categorical)
  - Nominal variable
  - Ordinal variable
- Quantitative (numerical)
  - Continuous variable
  - Discrete variable
Data types in R
- character
- numeric
- integer
- factor
- logical

R’s data structures
- vector
- list
- matrix
- data frame
Converting Data Types/Structures
- is. functions
- as. functions

2 / 50

Data types in a general sense

Let’s think about your daily coffee consumption:

One can say that I drink every day. Then it will be a nominal variable.
If you say that I drink 4 cups every day, then it will be discrete.
If you say that I drink 80 grams of coffee then it will be continuous.

3 / 50

Data types in a general sense4 / 50

Data Types in R

R is a programming language, it has own definitions of data types and structures.
Technically, R classifies all the different types of data into four classes:
- Logical
- Numeric (integer or double)
- Character
- Factor
Useful functions in R:
- Use class() to check the class of an object.
- Use typeof() to check whether a numeric object is integer or double.
- Use levels() to see the levels of a factor object.

5 / 50

Data Types in R

6 / 50

Data Types in R: Logical class

Logical class consists of TRUE or FALSE (binary) values.
A logical value is often created via comparison between variables.

x <- 10
y <- (x > 0)
y

## [1] TRUE

class(y)

## [1] "logical"

7 / 50

Data Types in R: Numeric Class

Numeric (integer or double): Quantitative values are numeric in R.
Numeric class can be integer or double.
Integer types can be seen as discrete values (e.g., 2) whereas, double class will have floating point numbers (e.g., 2.16).
To create a double numeric variable:

var1 <- c(4, 7.5, 14.5)

To create an integer variable, place an L directly after each number:

var2 <- c(4L, 7L, 14L)

8 / 50

Data Types in R: Numeric Class

var1 <- c(4, 7.5, 14.5)
var2 <- c(4L, 7L, 14L)

To check the class of numeric variable:

class(var1)

## [1] "numeric"

class(var2)

## [1] "integer"

To check whether an object is integer or double, use typeof().

typeof(var1)

## [1] "double"

typeof(var2)

## [1] "integer"

9 / 50

Data Types in R: Character Class

Character: A character class is used to represent string values in R.
To generate a character object, use quotation marks " " and assign a string/text to an object:

var3 <- c("debit", "credit", "Paypal")
class(var3)

## [1] "character"

10 / 50

Data Types in R: Factor Class

Factor class is used to represent qualitative data in R.
Factors can be ordered or unordered.
They store the nominal values as a vector of integers in the range $1 \dots k$ (where $k$ is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.
Factor objects can be created with the factor() function:

var4 <- factor( c("Male", "Female", "Male", "Male") )
var4

## [1] Male   Female Male   Male  
## Levels: Female Male

class(var4)

## [1] "factor"

11 / 50

Data Types in R: Factor Class Cont.

To see the levels of a factor object levels() function will be used:

levels(var4)

## [1] "Female" "Male"

By default, levels of the factors will be ordered alphabetically.

12 / 50

Data Types in R: Factor Class Cont.

To see the levels of a factor object levels() function will be used:

levels(var4)

## [1] "Female" "Male"

By default, levels of the factors will be ordered alphabetically.
Using the levels() argument, we can control the ordering of the levels while creating a factor:

var5 <- factor( c("Male", "Female", "Male", "Male"), 
                    levels = c("Male", "Female") )
var5

## [1] Male   Female Male   Male  
## Levels: Male Female

levels(var5)

## [1] "Male"   "Female"

12 / 50

Data Types in R: Ordered Factor Class

We can also create ordinal factors in a specific order using the ordered = TRUE argument:

var6 <-factor( c("DI", "HD", "PA", "NN", "CR", "DI", "HD", "PA"), 
               levels = c("NN", "PA", "CR", "DI", "HD"), 
               ordered=TRUE )
var6

## [1] DI HD PA NN CR DI HD PA
## Levels: NN < PA < CR < DI < HD

The ordering will be reflected as NN < PA < CR < DI < HD in the output.

13 / 50

Remark on Factors

Factors are also created during the data import. Many import functions like read.csv(), read_cvs(), read.table() etc. have stringsAsFactors option that determines how the character data is read in R.
The default is stringsAsFactors = False, but with setting it to TRUE all columns that are detected to be character/strings are converted to factor variables.

14 / 50

Remark on Factors

To illustrate the unnecessary factor conversions, let's read the VIC_pet.csv using read.csv():

pets <- read.csv("../data/VIC_pet.csv", stringsAsFactors = TRUE)
#pets1 <- read.csv("../data/VIC_pet.csv")
str(pets)

## 'data.frame':    40 obs. of  8 variables:
##  $ id               : Factor w/ 40 levels "10396v","104515v",..: 15 9 39 21 18 4 30 6 14 25 ...
##  $ State            : Factor w/ 1 level "Victoria": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Region           : Factor w/ 7 levels "Ballarat","Colac Otway",..: 2 7 1 3 3 1 3 5 7 3 ...
##  $ Animal_Type      : Factor w/ 7 levels "Cat","Cat       ",..: 7 2 1 4 4 4 4 5 6 1 ...
##  $ Animal_Name      : Factor w/ 28 levels "","Bailey","Blacky",..: 9 1 2 24 28 20 14 1 1 10 ...
##  $ Breed_Description: Factor w/ 34 levels "","American Staffordshire Terrier",..: 23 11 6 14 24 16 28 28 2 11 ...
##  $ Colour           : Factor w/ 3 levels "","NULL","WHI ": 2 1 1 1 1 1 1 1 1 1 ...
##  $ Animal_Desexed   : Factor w/ 4 levels "","N","y","Y": 1 1 1 1 1 1 1 4 1 1 ...

15 / 50

Remark on Factors

Now, let's focus on Animal_Type variable and check its levels using:

levels(pets$Animal_Type)

## [1] "Cat"                                     
## [2] "Cat       "                              
## [3] "dog"                                     
## [4] "Dog"                                     
## [5] "DOG "                                    
## [6] "Dog       "                              
## [7] "Dog                                     "

Note that actually there are two unique levels for the Animal_Type i.e. dog and cat.
However due to the automatic conversion of different strings to factors we observe seven different levels in Animal_Type.
Therefore it is a good practice to read such strings as characters and then apply string manipulations (which will be covered in Module 8) to standardize all strings to "dog" and "cat".

16 / 50

Remark on Factors

Now let's look at the levels of id variable which contains the unique identification number of the pets:

levels(pets$id)

##  [1] "10396v"  "104515v" "110188v" "114898v" "129666v" "13234v"  "137135v"
##  [8] "141587v" "142785v" "143032v" "151452v" "151569v" "151921v" "154462v"
## [15] "17819v"  "1828v"   "18714v"  "35939v"  "3654v"   "39333v"  "46906v" 
## [22] "49872v"  "51127v"  "54848v"  "55483v"  "5754v"   "61112v"  "64560v" 
## [29] "66701v"  "70244v"  "70794v"  "7089v"   "77361v"  "81001v"  "84561v" 
## [36] "88946v"  "92359v"  "93485v"  "97268v"  "97957v"

There is no need to factorize id variable as there are 40 observations and 40 different levels for the id level.
Therefore, any factorization of id variable would be inefficient. For such cases, it is better to leave this column as character (stringsAsFactors = FALSE) during the data import.

17 / 50

Data Structures in R

A data set is a collection of measurements or records which can be in any class (i.e., logical, character, numeric, factor, etc.).
Typically, data sets contain many variables of different length and type of values.
In R, we can store data sets using vectors, lists, matrices and data frames and these are called "Data Structures".

18 / 50

Data Structures in R

19 / 50

Data Structures in R: Vectors

A vector is the basic structure in R, which consists of one-dimensional sequence of data elements of the same basic type (i.e., integer , double , logical, or character).
Vectors are created by combining multiple elements into one dimensional array using the combine c() function.
The one-dimensional examples illustrated previously are considered vectors:

var1 <- c(4, 7.5, 14.5) # a double numeric vector
var2 <- c(4L, 7L, 14L) # an integer vector
var3 <- c(T, F, T, T) # a logical vector

20 / 50

Data Structures in R: Vectors Cont.All elements of a vector must be the same type, if you attempt to combine different types of elements they will be coerced to the most flexible type possible.
21 / 50

Data Structures in R: Vectors Cont.

All elements of a vector must be the same type, if you attempt to combine different types of elements they will be coerced to the most flexible type possible.

Vector of characters + numerics:

ex1 <- c("a", "b", "c", 1, 2, 3)

Vector of numerics + logical:

ex2 <- c(1, 2, 3, TRUE, FALSE)

Vector of logical + characters:

ex3 <- c(TRUE, FALSE, "a", "b", "c")

21 / 50

Data Structures in R: Vectors Cont.

All elements of a vector must be the same type, if you attempt to combine different types of elements they will be coerced to the most flexible type possible.

Vector of characters + numerics:

ex1 <- c("a", "b", "c", 1, 2, 3)

Vector of numerics + logical:

ex2 <- c(1, 2, 3, TRUE, FALSE)

Vector of logical + characters:

ex3 <- c(TRUE, FALSE, "a", "b", "c")

--> a character vector

## [1] "character"

--> a numeric vector

## [1] "numeric"

--> a character vector

## [1] "character"

21 / 50

Data Structures in R: Vectors Cont.

All elements of a vector must be the same type, if you attempt to combine different types of elements they will be coerced to the most flexible type possible.

Vector of characters + numerics:

ex1 <- c("a", "b", "c", 1, 2, 3)

Vector of numerics + logical:

ex2 <- c(1, 2, 3, TRUE, FALSE)

Vector of logical + characters:

ex3 <- c(TRUE, FALSE, "a", "b", "c")

--> a character vector

## [1] "character"

--> a numeric vector

## [1] "numeric"

--> a character vector

## [1] "character"

Ordering for coercion is roughly:

logical < integer < numeric < character

21 / 50

Data Structures in R: Vectors Cont.

To add additional elements to a vector use c() function.
Let's add two elements (4 and 6) to the ex2 vector:

ex4 <- c(ex2, 4, 6)
ex4

## [1] 1 2 3 1 0 4 6

22 / 50

Data Structures in R: Vectors Cont.

To subset a vector, we can use square brackets [ ] with positive or negative integers, logical values or names.

ex4

## [1] 1 2 3 1 0 4 6

Take the third element ex4:

ex4[3]

## [1] 3

Take first three elements in ex4:

ex4[1:3]

## [1] 1 2 3

23 / 50

Data Structures in R: Vectors Cont.

To subset a vector, we can use square brackets [ ] with positive or negative integers, logical values or names.

ex4

## [1] 1 2 3 1 0 4 6

Take the third element ex4:

ex4[3]

## [1] 3

Take first three elements in ex4:

ex4[1:3]

## [1] 1 2 3

Take the 1st, 3rd, and 5th element:

ex4[c(1,3,5)]

## [1] 1 3 0

Take all elements except first:

ex4[-1]

## [1] 2 3 1 0 4 6

Take all elements less than 3:

ex4[ ex4 < 3 ]

## [1] 1 2 1 0

23 / 50

Data Structures in R: Lists

A list is an R structure that allows you to combine elements of different types and lengths.
In order to create a list we can use the list() function.

list1 <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.5, 4.2))

To see the detailed structure within an object use the structure function str():

str(list1)

## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.5 4.2

Note how each of the four list items above are of different classes (integer, character, logical, and numeric) and different lengths.

24 / 50

Data Structures in R: Lists Cont.

To add on to lists we can use the append() function. Let's add a fifth element to the list1 and store it as list2:

list2 <- append(list1, list(c("credit", "debit", "Paypal")))
str(list2)

## List of 5
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.5 4.2
##  $ : chr [1:3] "credit" "debit" "Paypal"

25 / 50

Remark : Metadata (Attributes)

These metadata can be very useful in that they help to describe the object. Some examples of R object attributes are:
- names, dimnames
- dimensions (e.g. matrices, arrays)
- class (e.g. integer , numeric)
- length
- other user-defined attributes/metadata
Attributes of an object (if any) can be accessed using the attributes() function. Let's check if list2 has any attributes.

attributes(list2)

## NULL

26 / 50

Data Structures in R: Lists Cont.

We can add names to lists using names() function.

# add names to a pre-existing list
names(list2) <- c ("item1", "item2", "item3", "item4", "item5")
str(list2)

## List of 5
##  $ item1: int [1:3] 1 2 3
##  $ item2: chr "a"
##  $ item3: logi [1:3] TRUE FALSE TRUE
##  $ item4: num [1:2] 2.5 4.2
##  $ item5: chr [1:3] "credit" "debit" "Paypal"

Now, you can see that each element has a name and the names are displayed after a dollar $ sign.

27 / 50

Data Structures in R: Lists Cont.

In order to subset lists, we can use dollar $ sign, square brackets [ ] or double square brackets [[ ]]:

list2[1]  # take the first list item in list2

## $item1
## [1] 1 2 3

list2[[1]]  # take the first list item in list2 without attributes

## [1] 1 2 3

list2$item1 # take the first list item in list2 using $

## [1] 1 2 3

list2$item1[3] # take the third element out of first list item

## [1] 3

28 / 50

Data Structures in R: Lists Cont.

Here is a good explanation on difference between square brackets [ ] and double square brackets [[ ]] in subsetting lists:

29 / 50

Data Structures in R: Matrices

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout.
In R, the elements of a matrix must be of same class (i.e. all elements must be numeric, or character, etc.) and all columns of a matrix must be of same length.

We can create a matrix using the matrix() function using nrow and ncol arguments.

m1 <- matrix(1:6, nrow = 2, ncol = 3)
m1

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

30 / 50

Data Structures in R: Matrices Cont.

Matrices can also be created using the column-bind cbind() and row-bind rbind() functions.
Note that the vectors that are being binded must be of equal length and mode.

v1 <- c( 1, 4, 5)
v2 <- c( 6, 8, 10)

# create a matrix using column-bind
m2 <- cbind(v1, v2) 
m2

##      v1 v2
## [1,]  1  6
## [2,]  4  8
## [3,]  5 10

# create a matrix using row-bind
m3 <- rbind(v1, v2) 
m3

##    [,1] [,2] [,3]
## v1    1    4    5
## v2    6    8   10

31 / 50

Data Structures in R: Matrices Cont.

We can also use cbind() and rbind() functions to add onto matrices.

v3 <- c(9, 8, 7)
m4 <- rbind(m3, v3)
m4

##    [,1] [,2] [,3]
## v1    1    4    5
## v2    6    8   10
## v3    9    8    7

32 / 50

Data Structures in R: Matrices Cont.

We can add names to the rows and columns of a matrix using rownames and colnames.

rownames(m4) <- c("subject1", "subject2", "subject3")
colnames(m4) <- c("var1", "var2", "var3")
attributes(m4)

## $dim
## [1] 3 3
## 
## $dimnames
## $dimnames[[1]]
## [1] "subject1" "subject2" "subject3"
## 
## $dimnames[[2]]
## [1] "var1" "var2" "var3"

33 / 50

Data Structures in R: Matrices Cont.

In order to subset matrices we use the [ ] operator.
As matrices have two dimensions we need to specify subsetting arguments for both row and column dimensions like: matrix[rows, columns]:

m4

##          var1 var2 var3
## subject1    1    4    5
## subject2    6    8   10
## subject3    9    8    7

m4[1,2] # take the value in the first row and second column

## [1] 4

34 / 50

Data Structures in R: Matrices Cont.

m4[1:2, ] # subset for rows 1 and 2 but keep all columns

##          var1 var2 var3
## subject1    1    4    5
## subject2    6    8   10

m4[ , c(1, 3)] # subset for columns 1 and 3 but keep all rows

##          var1 var3
## subject1    1    5
## subject2    6   10
## subject3    9    7

m4[1:2, c(1, 3)] # subset for both rows and columns

##          var1 var3
## subject1    1    5
## subject2    6   10

35 / 50

Data Structures in R: Data Frames

The most common way of storing data in R and, generally, is the data structure most often used for data analyses.
A data frame (DF) is a list of equal-length vectors and they can store different classes of objects in each column (i.e., numeric, character, factor).
DFs are usually created by importing/reading in a data set using the functions covered in Module 2.
Can also be created explicitly with the data.frame() function or they can be coerced from other types of objects like lists.

36 / 50

Data Structures in R: Data Frames Cont.

df1 <- data.frame( col1 = 1:3,
                   col2 = c ("credit", "debit", "Paypal"),
                   col3 = c (TRUE, FALSE, TRUE),
                   col4 = c (25.5, 44.2, 54.9),
                   stringsAsFactors = TRUE)
str(df1)

## 'data.frame':    3 obs. of  4 variables:
##  $ col1: int  1 2 3
##  $ col2: Factor w/ 3 levels "credit","debit",..: 1 2 3
##  $ col3: logi  TRUE FALSE TRUE
##  $ col4: num  25.5 44.2 54.9

37 / 50

In the example above, col2 is converted to a column of factors. This is because of using stringsAsFactors = TRUE that converts character columns to factors.

Data Structures in R: Data Frames Cont.

With no setting (stringsAsFactors = FALSE):

df1 <- data.frame (col1 = 1:3,
                  col2 = c ("credit", "debit", "Paypal"),
                  col3 = c (TRUE, FALSE, TRUE),
                  col4 = c (25.5, 44.2, 54.9))
str(df1)

## 'data.frame':    3 obs. of  4 variables:
##  $ col1: int  1 2 3
##  $ col2: chr  "credit" "debit" "Paypal"
##  $ col3: logi  TRUE FALSE TRUE
##  $ col4: num  25.5 44.2 54.9

38 / 50

Data Structures in R: Data Frames Cont.

We can add columns (variables) and rows (items) on to a data frame using cbind() and rbind() functions:

# create a new vector
v4 <- c("VIC", "NSW", "TAS")
# add a column (variable) to df1
df2 <- cbind(df1, v4)

39 / 50

Data Structures in R: Data Frames Cont.

To add attributes to data frames we use rownames() and colnames().

rownames(df2) <- c("subj1", "subj2", "subj3") # add row names
colnames(df2) <- c("number", "card_type", "fraud", "transaction", "state") # add column names 
str(df2)

## 'data.frame':    3 obs. of  5 variables:
##  $ number     : int  1 2 3
##  $ card_type  : chr  "credit" "debit" "Paypal"
##  $ fraud      : logi  TRUE FALSE TRUE
##  $ transaction: num  25.5 44.2 54.9
##  $ state      : chr  "VIC" "NSW" "TAS"

attributes(df2)

## $names
## [1] "number"      "card_type"   "fraud"       "transaction" "state"      
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] "subj1" "subj2" "subj3"

40 / 50

Data Structures in R: Data Frames Cont.

Data frames possess the characteristics of both lists and matrices.
If you subset with a single vector, they behave like lists and will return the selected columns with all rows and if you subset with two vectors, they behave like matrices and can be subset by row and column.

df2

##       number card_type fraud transaction state
## subj1      1    credit  TRUE        25.5   VIC
## subj2      2     debit FALSE        44.2   NSW
## subj3      3    Paypal  TRUE        54.9   TAS

df2[2:3, ] # subset by row numbers, take second and third rows only

##       number card_type fraud transaction state
## subj2      2     debit FALSE        44.2   NSW
## subj3      3    Paypal  TRUE        54.9   TAS

41 / 50

Data Structures in R: Data Frames Cont.

df2[c("subj2", "subj3"),  ] # same as above but uses row names

##       number card_type fraud transaction state
## subj2      2     debit FALSE        44.2   NSW
## subj3      3    Paypal  TRUE        54.9   TAS

df2[, c(1,4)] # subset by column numbers, take first and forth columns only

##       number transaction
## subj1      1        25.5
## subj2      2        44.2
## subj3      3        54.9

42 / 50

Data Structures in R: Data Frames Cont.

df2[, c("number", "transaction")] # same as above but uses column names

##       number transaction
## subj1      1        25.5
## subj2      2        44.2
## subj3      3        54.9

df2[2:3, c(1, 4)] # subset by row and column numbers

##       number transaction
## subj2      2        44.2
## subj3      3        54.9

43 / 50

Data Structures in R: Data Frames Cont.

df2[c("subj2", "subj3"), c("number", "transaction")] # same as above but uses row and column names

##       number transaction
## subj2      2        44.2
## subj3      3        54.9

df2$fraud # subset using $: take the column (variable) fraud

## [1]  TRUE FALSE  TRUE

df2$fraud[2] # take the second element in the fraud column

## [1] FALSE

44 / 50

Converting Data Types/Structures

In traditional programming languages, you need to specify the type of data as a given variable can contain i.e. either integer, character, string or decimal.
R is smart enough to "guess/create" the data type based on the values provided for a variable. However, R is not that smart (thanks to that! Otherwise why we need analysts!) to guess the correct data type within the context of analysis.

45 / 50

Converting Data Types/Structures

To illustrate this point, let's import the banksim.csv data set.

library(readr)
bank <- read_csv("../data/banksim.csv")
str(bank)

## spc_tbl_ [15 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id       : num [1:15] 1 2 3 4 5 6 7 8 9 10 ...
##  $ age      : chr [1:15] "44" "88" "36" "25<=" ...
##  $ marital  : chr [1:15] "married" "married" "divorced" "single" ...
##  $ education: chr [1:15] "secondary" "secondary" "secondary" "secondary" ...
##  $ job      : chr [1:15] "blue-collar" "admin." "blue-collar" "technician" ...
##  $ balance  : num [1:15] 16178 330 853 616 310 ...
##  $ day      : num [1:15] 21 2 20 28 12 16 15 5 26 14 ...
##  $ month    : chr [1:15] "nov" "dec" "jun" "jul" ...
##  $ duration : num [1:15] 297 357 15 117 54 -268 129 156 168 216 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_double(),
##   ..   age = col_character(),
##   ..   marital = col_character(),
##   ..   education = col_character(),
##   ..   job = col_character(),
##   ..   balance = col_double(),
##   ..   day = col_double(),
##   ..   month = col_character(),
##   ..   duration = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

46 / 50

Converting Data Types/Structures

The str() output reveals how R guesses the data types of each variable.
Accordingly, id, day and duration are read as numeric values, and the rest are read as characters. However, according to the variable definitions given above, the correct data type for age and balance variables should be numeric (or integer).
As seen from the output, row 4 of age column has "<=" and row 12 of balance column is "528D", therefore these characters forced columns to be read as characters even if they have a numeric nature.
A good practice is always to:
- check the definitions of variables, and understand their types within the context;
- then apply proper type conversions if they are not in the correct data type.

47 / 50

Converting Data Types/Structuresas. functions will convert the object to a given type (whenever possible) and is. functions will test for the given data type and return a logical value (TRUE or FALSE).


as. Functions
Changes type to
is. Functions
Checks if type is


as.numeric()
numeric
is.numeric()
numeric


as.integer()
integer
is.integer()
integer


as.double()
double
is.double()
double


as.character()
character
is.character()
character


as.factor()
factor
is.factor()
factor


as.logical()
logical
is.logical()
logical


as.vector()
vector
is.vector()
vector


as.list()
list
is.list()
list


as.matrix()
matrix
is.matrix()
matrix


as.data.frame()
data frame
is.data.frame()
data frame


48 / 50

`as.` Functions	Changes type to	`is.` Functions	Checks if type is
`as.numeric()`	numeric	`is.numeric()`	numeric
`as.integer()`	integer	`is.integer()`	integer
`as.double()`	double	`is.double()`	double
`as.character()`	character	`is.character()`	character
`as.factor()`	factor	`is.factor()`	factor
`as.logical()`	logical	`is.logical()`	logical
`as.vector()`	vector	`is.vector()`	vector
`as.list()`	list	`is.list()`	list
`as.matrix()`	matrix	`is.matrix()`	matrix
`as.data.frame()`	data frame	`is.data.frame()`	data frame

What do you need to know by Week 3

Understand R’s basic data types (i.e., character, numeric, integer, factor, and logical).
Understand R’s basic data structures (i.e., vector, list, matrix, and data frame) and main differences between them.
Learn to check attributes (i.e., name, dimension, class, levels etc.) of R objects.
Learn how to convert between data types/structures and understand coercion rules.

Practice!

49 / 50

Worksheet questions:

Complete the following worksheet:

Module 3 Worksheet

Once completed, feel free to continue working on your Assessments.

Return to Course Website

50 / 50

Module 3: Outline

Data types in general
- Qualitative (categorical)
  - Nominal variable
  - Ordinal variable
- Quantitative (numerical)
  - Continuous variable
  - Discrete variable
Data types in R
- character
- numeric
- integer
- factor
- logical

R’s data structures
- vector
- list
- matrix
- data frame
Converting Data Types/Structures
- is. functions
- as. functions

2 / 50

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help