3 Fundamentals I

This section (and the next) lay out the fundamental elements of a programming langauge. By understanding these core components, one can learn a new programming language by looking up the equivalent syntax in the new language; in some cases the syntax may even be identical.

3.1 Comments

Comments are lines of code that are ignored when the code is run. Comments can be single line or span multiple lines. It is good practise to include comments that enable a user to understand the intention of the code and flag its limitations.

Different languages have different comment syntax. In R, comments cannot span more than a single line of code and are introduced by the reserved # symbol.

# Anything can be written here without affecting
# the implementation of the rest of the code.

In the sections that follow, comments will be used to detail the output of the sample code that precedes it.

3.2 Assignment

Programming languages allow values to be stored, such as numbers or text, into objects called variables. The operation of assignment is in most languages performed by the = symbol. However this has a different meaning than its algebraic counterpart, as assigment operation is not symmetric. The = sign sets the value of the variable on the left to the value of the expression on the right.

a = 3       # define a variable 'a' and give it the value of 3
3 = a       # The command is not symmetric
# Error in 3 = a : invalid (do_set) left-hand side to assignment

a # the previous command does not return the value of a, we need an explicit call
# [1] 3

Some languages solve the = algebraic paradox using more meaningful symbols for assignment, like <- in R. This is particularly useful considering it is possible to assign the value of a variable to another variable, or to an altered value of itself.

a = 3       # set the value to a to another variable
a = a + 1   # set the value of a equal to the value of itself plus 1

b <- 3      # set the value to b to another variable
b <- b + 1  # set the value of b equal to the value of b plus 1

a - b
# [1] 0     # a is equal to b

The first expression above would appear false from an algebraic point of view; the value of a cannot equal the value of a plus one. However, as shown at the end of the code the = and <- expression are identical.

3.3 Reserved words

Programming languages have a set of words and symbols that are reserved for particular purposes which are fixed and cannot be changed within a code script. A common example is if.

if <- "Hello World"
# Error: unexpected assignment in "if <-"

As expected trying to assign a reserved word gives us an error. Most symbols in programming languages are reserved.

; <- a
# Error: unexpected ';' in ";"

3.4 Line breaks

Most programming languages use line breaks to distinguish a command from the next one.

"hello"
# [1] "hello"
"world"
# [1] "world"

Trying to write two commands on the same line will give us an error.

"hello" "world"
# Error: unexpected string constant in ""hello" "world""

However in most programming languages it is possible to simulate a line break with a semicolon.

"hello" ; "world"
# [1] "hello"
# [1] "world"

It is also common to spread long expressions over several lines, eg using a new line for each input to a function. Note that with VBA, this feature is enabled by ending a line with a space and underscore _ in order to explicitly tell the interpreter to join this line with the one that follows.

3.5 White space

Typically, the inclusion of white space does not affect the code, and it is a matter of preference/convention as to where white spaces are included.

if( 3 > 2 ){ print("3 is greater than 2") }
#[1] "3 is greater than 2"

if(3>2){print("3 is greater than 2")}
#[1] "3 is greater than 2"

# Both statements return the same result

However, should white spaces be included within the reserved words, this will impact the code as the reserved word would not be recognised anymore.

A notable exception to this rule is the Python programming language, where the indentation of lines of code does have meaning, and is used to define the code within a control structure.

# Python

if 3 > 2:
    print("3 is greater than 2")
# "3 is greater than 2"

3.6 Arithmetic operators

Arithmetic can be performed in code using the standard operators; add, subtract, multiply, divide, exponentiational and modulo.

2 + 3 # add
# [1] 5

3 - 2 # subtract
# [1] 1

3 * 2 # multiply
# [1] 6

3/2 # divide
# [1] 1.5

2^3 # exponentiation I
# [1] 8

2**3 # exponentiation II
# [1] 8

3 %% 2 # modulo
# [1] 1

7 %/% 2 # integer division
# [1] 3

Variable assignement and arithmetic operations can be combined in to more complex coding expressions, like the following.

i <- 0.03       # give the variable i the value 3%
n <- 5          # give the variable n the value 5

(1-(1+i)**-n)/i
# [1] 4.579707  # value of an annuity certain over 5 years at 3%

3.7 Variable types

Variables have a type. If a programming language is ‘loosely typed’, like R or Python, then the variable type does not have to be declared, and the type can change over the course of a script. If a programming language is ‘strongly typed’ then the variable type does have to be declared and cannot change.

The main types of variables are:

3.7.1 Boolean

This variable type is either TRUE or FALSE. TRUE and FALSE are also referred in R as T and F.

3.7.2 Numeric

This variable type can store a number. There are typically further sub divisions of this type, namely:

  • Integer which can store a whole number within a certain range,
  • Float or Double which can store a decimal number within a certain range, and/or
  • Long which can store a greater range of numbers.

3.7.3 String

This variable type can store a set of characters. Note that numbers can also be stored as a string, but they need to be reconverted to numbers to allow arithmetic operations. Strings are easily identifiable in R because they are single or double quoted.

a <- "12" # A string data type
b <- '13' # Another string

a + b
# Error in a + b : non-numeric argument to binary operator

When dealing with strings, the data type is normally accompanied by a host of useful functions such as:

  • Determining the length of the string, nchar() in R
  • Searching for the presence of a substring, grepl() in R
  • Subsetting a string, substr() in R
  • Splitting the string with a certain delimiter, strsplit() in R
  • Replacing substrings with new substrings, gsub() in R
  • Concatenating strings, paste() and paste0() in R
  • Changing character casing, tolower() and toupper() in R
nchar("string") # Determining the length of the string
# [1] 6

grepl(pattern = "ri", "string") # Searching for the presence of a substring
# [1] TRUE

substr("string", start = 3, stop = 6) # Subsetting a string
# [1] "ring"

strsplit("string", split = "r") # Splitting the string with a certain delimiter
# [1] "st" "ing"

gsub(pattern = "s", replacement = "S", "string") # Replacing substrings with new substrings
# [1] "String"

paste("st","ring"); paste0("st", "ring") # Concatenating strings
[1] "st ring"
[1] "string"

tolower("StRiNg"); toupper("string") # Changing character casing
# [1] "string"
# [1] "STRING"

3.7.4 Arrays

This variable type can store a set of variables. This more sophisticated data type can take the form of:

  • a one dimensional array, like a vector,
  • a multi-dimensional array, like a table, and/or
  • an associative array, where variables are stored in key value pairs, like names in a phonebook.

R has a host of array implementations, depending on dimensions and whether the data is homogenous/heterogenous, i.e. if all the variables in the array have the same type or not. The basic one dimensional array is called an atomic vector, the two-dimensional version a matrix and its n-dimensional version an array. Vectors in R are declared through concatenation with the c() function, or by the : symbol, which takes the sequence of integers from the number on its left to the one on its right.

c(1,2,3,4) # Atomic Vector define by c()
# [1] 1 2 3 4

1:4 # Atomic Vector define by c()
# [1] 1 2 3 4

matrix(1:6, nrow = 2) # matrix
#     [,1] [,2] [,3]
# [1,]    1    3    5
# [2,]    2    4    6

c("a" = 2, "b" = 3, "c" = 4) # named vector
# a b c
# 2 3 4

A detailed discussion of the different types of arrays in R can be found here.

R includes native support for vector/matrix operations, as per the simple numeric arithmetic examples above and more.

a <- 1:4; b <- 4:1

a + b # vectorised addition
# [1] 5 5 5 5

a - b # vectorised subtract
# -3 -1  1  3

a * b # vectorised multiply
# [1] 4 6 6 4

a/b # vectorised divide
# [1] 0.2500000 0.6666667 1.5000000 4.0000000

a %% b # vectorised modulo
# [1] 1 2 1 0

a %/% b # integer division
# [1] 0 0 1 4

a %*% b # matrix product
#      [,1]
# [1,]   20

a %o% b # cartesian product
#     [,1] [,2] [,3] [,4]
# [1,]    4    3    2    1
# [2,]    8    6    4    2
# [3,]   12    9    6    3
# [4,]   16   12    8    4

3.7.5 Coercion

Even though variables can be of a certain type, they can be coerced into another type.

"2" + "2"
Error in "2" + "2" : non-numeric argument to binary operator

as.numeric("2") + as.numeric("2") # now you can
# [1] 4

3.8 Logical operators

Logical operations can be performed on variables.

# define four seperate variables in one statement
a <- 3; b <- 4; y <- TRUE; n <- FALSE; d <- 1:4;

a >= b # is a greater than or equal to b
# [1] FALSE

a == b # is a equal to b
# [1] FALSE

a != b # is a not equal to b
# [1] TRUE

a < b # is a less than b
# [1] TRUE

y && n # are both y and n true
# [1] FALSE

y || n # either y or n is true
# [1] TRUE

y != n # y is not equal to b
# [1] TRUE

!n # not n
# [1] TRUE

!y # not y
# [1] FALSE

!!y # not(not y)
# [1] TRUE

1 %in% d # is contained in
# [1] TRUE