Lab 03

Directions

For this lab, you will create a single .R file called lab03.R. The following exercises will ask you to write code. Place all requested code in this .R file separated by comments which indicate which code corresponds to which exercise.

Note: Do not write your name or netID anywhere in the file or in the filename. To the best of our ability, we will try to grade labs anonymously.

Submit your lab to the corresponding assignment on Canvas. You have unlimited attempts before the deadline. Your final submission before the deadline will be graded.

Grading

This lab will be graded based on a mix of correctness and completion. For each exercise that you demonstrate a good-faith effort to complete, you will receive at least one point.

Exercise 1 (Working with Vectors)

Run and add the following four lines to your .R file.

set.seed(42)
x = rnorm(100)
y = sample(letters, size = 1000, replace = TRUE)
z = sample(c(TRUE, FALSE, NA), size = 100, replace = TRUE)

Write three additional lines of code:

A line of code that sums the elements of x that are stored in even indexes.
A line of code that calculates the proportion of vowels (a, e, i, o, u) in y.
A line of code that counts the number of elements which are NA in z.

Exercise 2 (Working with Lists)

The following code (which you should include in your .R file) creates a truly absurd list.

absurd_list = list(
  q = "Look elsewhere.", 
  y = list(
    z = list(x = 1),
    x = list(x = 2),
    zzz = list(
      zzz = "Not here.",
      zz = list(qq = "gg"),
      a = list(
        b = list(
          q = list(
            answer = list(
              answer = "Hello World!")
      )))
    )
))

Using only brackets and integers, write a single line of code that extracts the atomic vector of length one named answer containing Hello World! nested deep inside this list.

Exercise 3 (Working with Data Frames)

The airquality dataset comes pre-loaded in R.

head(airquality)

##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

Do two things to this data frame:

Remove any rows that contain NA in the Ozone column.
Remove any rows that contain NA in the Solar.R column.

Store your result in a data frame named answer.

Because these are the only two columns that contain NA values, you can check your work by comparing your result to na.omit(airquality). Specifically, you can run:

all(answer == na.omit(airquality))

Done correctly, this should return TRUE. (We cannot use the identical() function here because running na.omit() adds some attributes to the data frame it returns.) Obviously, your solution should not use na.omit().

Exercise 4 (2021 Illini in the NFL)

For this exercise and the next, we will utilize data provided by the nflreadr package.

Assuming you do not have this package installed, you can do so by running:

install.packages("nflreadr")

Do not include this installation code in your .R file.

The following two lines of code will read in data on NFL rosters for the 2021 season. Include these lines in your .R file.

library(nflreadr)
rosters_2021 = as.data.frame(load_rosters(seasons = 2021))

The data are stored in rosters_2021 as a data frame. (Here we are coercing to data frame as the data is initially loaded as a data.table which we will address later in the semester.)

When using “real” data, it is always best to find a data dictionary to better understand what the rows and columns of the data represent.

Data Dictionary - Rosters

Do the following:

Remove any rows containing players with NA for their college. Replace rosters_2021 with this new data frame. (Some players have a college of None which we will keep. None means we have information about their college, while NA means we simply don’t know.)
Create a new data frame named nfl_illini_2021 which includes only rows for NFL players that played at Illinois in college. In this data frame, only include columns for their current NFL team, their position, their jersey number, their full name, as well as their height and weight.
After creating nfl_illini_2021, make one modification to it: Correct Nate_Hobbs’ height. You should notice that it is clearly not correct in the data. Replace the current value with the more correct 6-0.
- Note: This is no longer the case as the data has been corrected. However, the corrected data lists 6-1, so you can still change it to 6-0, it just will not be obvious that it was an entry error.
Add one more line of code to your .R file, nfl_illini_2021, which will output the result of your work.

Exercise 5 (2021 Chicago Bears Schedule)

The following code (which you should include in your .R file) loads data on the 2021 NFL schedule.

nfl_2021 = as.data.frame(load_schedules(seasons = 2021))

Data Dictionary - Schedules

Write code that extracts the Chicago Bear’s schedule from this data. The result should be a data frame with 17 rows. To make the output easy to read, only include columns for the week, away team, and home team.