4 Fundamentals II

Whilst the previous chapter described the programming components that are executed in the order they are written down, this chapter considers control structures that alter the flow of the code - sections of code can be skipped out, jumped to or repeated indefinitely.

4.1 Control Structures

4.1.1 If statements

The if statement evaluates an expression to determine whether to run the conditional code contained within. An else clause can be added to include code that is run should the test expression render as FALSE.

n <- 10
if (n %% 2 == 0){
  # this code is executed
  print("number is even")
} else {
  # this code is ignored
  print("number is odd")
}
# [1] "number is even"

The else clause can be replaced with an else if clause to add further logic.

age <- 17.5
if (age < 17) {
  # this code executes for ages less than 17
  print("This person cannot drive or vote")
} else if (age < 18) {
  # this code executes for ages less than 18
  # but 17 or greater only
  print("This person can drive but cannot vote")
} else {
  # this code executes for ages 18 and over
  print("This person can drive and vote")
}
# [1] "This person can drive but cannot vote"

4.1.2 Switch statements

Whilst the if else structure offers the possiblity to evaluate many different cases of logic at once, the code can become confusing when it spans several logical conditions. Sometimes it is easier and cleaner to express the same logic in a single switch statement.

# using the if-else structure for four levels of logic
test <- "a"
if (test == "a"){
  print(1)
} else if (test == "b"){
  print(2)
} else if (test == "c"){
  print(3)
} else {
  # default
  print(4)
}
# [1] 1

# implementing the same code with a switch statement
test <- "a"
switch(test, "a" = 1, "b" = 2, "c" = 3, 4)
# [1] 1

test <- "z"
switch(test, "a" = 1, "b" = 2, "c" = 3, 4)
# [1] 4

4.1.3 For loop

The for loop repeats the code contained within it a designated number of times. A for loop typically requires:

  • an increment variable, typically set to i
  • a starting value of the increment variable, and
  • a terminating value of the increment variable.
a <- 0
for(i in 1:4){
  a = a + i
}
print(a) # 1 + 2 + 3 + 4 =
# [1] 10

However, the for loop does not need to run every iteration. Steps can be missed out, and the loop can be broken off early.

# find the first number that is divisible by 7 and 3
for(i in 1:100){
  if (i %% 3 != 0){next} # this line runs for every i
  if (i %% 7 != 0){next} # this line runs for every i that is divisble by 3
  # this line runs for every i that is divisble by both 3 and 7
  print(paste(i, "is divisible by 7 and 3"))
  break # the loop terminates at i = 21
}
# [1] "21 is divisible by 7 and 3"

The for loop offers the flexibility for more complicated code repetition. The increment variable can be set to decrement with each step, it can increment in any step size (eg increment in steps of 2, -1) and for loops can iterate over the values in an array.

4.1.4 While loop

A while loop repeats the code contained within it while the test condition evaluates to true.

# find the first number that is divisble by 7 and 3
n <- 0
continue_while <- TRUE # define the test condition variable for the while loop
while (continue_while){
  n = n + 1
  # test whether the number is divisble by both 3 and 7
  if (n %% 3 == 0 && n %% 7 == 0){
    # if so, set the test condition to false to end the while loop
    continue_while = FALSE
  }
}
print(paste(n, "is divisible by 3 and 7"))
# [1] "21 is divisible by 3 and 7"

The while loop can also break early and skip steps, like the for loop. The two control structures are therefore very similar in nature. However, the while loop is typically used when the step and/or terminating value of the loop is unknown.

A variant of the while loop is the do while loop control structure. The only difference being that the do while loop definitely evaluates the code within it once, as the test condition is only evaluated at the end of the code block. Whereas the while loop evaluates the test condition at its start and therefore may never implement the code within.

4.2 Functions

A function is a block of code that is encapsulated into a single named object. This object can then be used as a shorthand expression for the code contained within it. A function can take input arguments as part of its definition and may or may not output a variable.

# define a function that returns the factorial of a given number n
factorial <- function(n){
  n_factorial <- 1
  for(i in 2:n){
    n_factorial = n_factorial * i
  }
  return(n_factorial)
}
# nothing is returned yet, as the function has not yet been called

# calling the function
factorial(4) # 4 * 3 * 2 * 1 =
# [1] 24

An important concept with functions is their scope. Variables created inside functions are typically not available for use outside of the function.

factorial(5) # the n_factorial variable is created to evaluate this command
# [1] 120
n_factorial # yet should we call the n_factorial variable after the function has finished...
# Error: object 'n_factorial' not found

# by the same token, defining a variable with the same name outside of the function is not
# affected by the implementation of the function running with the same named variable
n_factorial <- 173
factorial(5) # another instance of n_factorial now exists in the function's scope temporarily
# [1] 120
n_factorial # nevertheless, the original value remains unchanged
# [1] 173

However, global variables created outside of the function can be available inside the function. This is generally not good coding practise, as functions are ideally discrete blocks of code that do not alter the state of the program it is operating within, as it may lead to unintended (and hard to detect) behaviour elsewhere.

4.3 Import statement

Code that has been saved in a seperate location can be imported into the current file and run.

# let's create a file called 'other.r' with the following code:
# --- other.r ---
print("Hello from other.R")

# a file in the same directory can run this code with the following statement
# --- another.r ---
source("other.r")
# [1] "Hello from other.R"

This technique is not limited to code saved in a local directory. Code from external sources can be imported and run. Libraries (also called modules or packages) of code exist to perform a wide range of tasks and it is often more sensible to leverage these rather than code all the functionality yourself. Often these libraries are open-source meaning that the underlying code is visible and typically they can be used for free and collaborated on. As many libraries are widely used, they tend to be well tested and offer a lot of functionality.

# R is supported by the Comprehensive R Archive Network (CRAN) and its mirrors
# a package first needs to be installed from a CRAN site prior to its initial use
# this step only needs to be performed once
install.packages("RColorBrewer")
# this function will install any additional packages that this one package is dependent upon
# and any other packages that they are dependent on, and so on

# to use the code later on, it has to be imported
library("RColorBrewer") # Creates nice looking colour palettes

brewer.pal(5, "Blues") # RColorBrewer function to return a palette of 5 blues
# [1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"

# a popular set of R packages is called the 'tidyverse'. More details can be found here:
# https://www.tidyverse.org/

Whilst it can be easier to simply import code that performs the task one is trying to code themselves, caution should be exercised:

  • A package may not perform entirely as expected in all circumstances, either because the code is not tailored to your precise problem or bugs exist in the code. However, packages that are widely used and developed under best practices tend to be very reliable and well documented.
  • Packages may not work in different operative systems or even with different versions of the same compiler. Libraries often include code dependecies, i.e. imports from other sources. This increases the risk, as a problem in a single library can extend to other libraries which dependend on it.
  • It may unneccessarily increase the file size or operation time of the final code, especially if the package is quite large or it requires a host of other packages it is dependent upon to be installed for it to work. However, the package could also be faster if the code is well optimised or makes use of lower level libraries.
  • Security should also be considered; a 2018 data breach with the British Airways website was caused by a hacked external package that the main site simply imported. Ensure packages are downloaded from trusted locations and are verified once downloaded (e.g. through MD5 checksums, which most package managers do automatically).

4.4 Function operators

Some languages include operators beyond the mathematical and logical operators seen in the previous chapter. An example in base R is the previously mentioned %in% operator to check if an element is contained in a vector. One of the most used functional operators in R is the magrittr pipe operator, %>%, which is used to chain functions. This is where the output of a function is immediately passed into another function. This is a form of ‘syntactic sugar’, where the code functionality doesn’t actually change, but the syntax does in order to enable a more fluid and condensed version of the code.

# consider the following code that passes a variable into two functions (sin and then cos)
number <- pi
number_sin <- sin(pi)
number_sin_cos <- cos(number_sin)
number_sin_cos
# [1] 1
# three interim variables were required to calculate the result

# R does not have a pipe operator
library(magrittr)
# the magrittr package defines its own pipe operator %>%
# that enables function chaining and is imported by numerous other packages

# the value can be passed sequentially into each function in one line
pi %>% sin() %>% cos()
# [1] 1
# the same result is achieved in one line without the need for interim variables