3 Fundamentals I
This section (and the next) lay out the fundamental elements of a programming langauge. By understanding these core components, one can learn a new programming language by looking up the equivalent syntax in the new language; in some cases the syntax may even be identical.
3.2 Assignment
Programming languages allow values to be stored, such as numbers or text, into objects called variables. The operation of assignment is in most languages performed by the =
symbol. However this has a different meaning than its algebraic counterpart, as assigment operation is not symmetric. The =
sign sets the value of the variable on the left to the value of the expression on the right.
a = 3 # define a variable 'a' and give it the value of 3
3 = a # The command is not symmetric
# Error in 3 = a : invalid (do_set) left-hand side to assignment
a # the previous command does not return the value of a, we need an explicit call
# [1] 3
Some languages solve the =
algebraic paradox using more meaningful symbols for assignment, like <-
in R. This is particularly useful considering it is possible to assign the value of a variable to another variable, or to an altered value of itself.
a = 3 # set the value to a to another variable
a = a + 1 # set the value of a equal to the value of itself plus 1
b <- 3 # set the value to b to another variable
b <- b + 1 # set the value of b equal to the value of b plus 1
a - b
# [1] 0 # a is equal to b
The first expression above would appear false from an algebraic point of view; the value of a
cannot equal the value of a
plus one. However, as shown at the end of the code the =
and <-
expression are identical.
3.3 Reserved words
Programming languages have a set of words and symbols that are reserved for particular purposes which are fixed and cannot be changed within a code script. A common example is if
.
if <- "Hello World"
# Error: unexpected assignment in "if <-"
As expected trying to assign a reserved word gives us an error. Most symbols in programming languages are reserved.
; <- a
# Error: unexpected ';' in ";"
3.4 Line breaks
Most programming languages use line breaks to distinguish a command from the next one.
"hello"
# [1] "hello"
"world"
# [1] "world"
Trying to write two commands on the same line will give us an error.
"hello" "world"
# Error: unexpected string constant in ""hello" "world""
However in most programming languages it is possible to simulate a line break with a semicolon.
"hello" ; "world"
# [1] "hello"
# [1] "world"
It is also common to spread long expressions over several lines, eg using a new line for each input to a function. Note that with VBA, this feature is enabled by ending a line with a space and underscore _
in order to explicitly tell the interpreter to join this line with the one that follows.
3.5 White space
Typically, the inclusion of white space does not affect the code, and it is a matter of preference/convention as to where white spaces are included.
if( 3 > 2 ){ print("3 is greater than 2") }
#[1] "3 is greater than 2"
if(3>2){print("3 is greater than 2")}
#[1] "3 is greater than 2"
# Both statements return the same result
However, should white spaces be included within the reserved words, this will impact the code as the reserved word would not be recognised anymore.
A notable exception to this rule is the Python programming language, where the indentation of lines of code does have meaning, and is used to define the code within a control structure.
# Python
if 3 > 2:
print("3 is greater than 2")
# "3 is greater than 2"
3.6 Arithmetic operators
Arithmetic can be performed in code using the standard operators; add, subtract, multiply, divide, exponentiational and modulo.
2 + 3 # add
# [1] 5
3 - 2 # subtract
# [1] 1
3 * 2 # multiply
# [1] 6
3/2 # divide
# [1] 1.5
2^3 # exponentiation I
# [1] 8
2**3 # exponentiation II
# [1] 8
3 %% 2 # modulo
# [1] 1
7 %/% 2 # integer division
# [1] 3
Variable assignement and arithmetic operations can be combined in to more complex coding expressions, like the following.
i <- 0.03 # give the variable i the value 3%
n <- 5 # give the variable n the value 5
(1-(1+i)**-n)/i
# [1] 4.579707 # value of an annuity certain over 5 years at 3%
3.7 Variable types
Variables have a type. If a programming language is ‘loosely typed’, like R or Python, then the variable type does not have to be declared, and the type can change over the course of a script. If a programming language is ‘strongly typed’ then the variable type does have to be declared and cannot change.
The main types of variables are:
3.7.1 Boolean
This variable type is either TRUE
or FALSE
. TRUE
and FALSE
are also referred in R as T
and F
.
3.7.2 Numeric
This variable type can store a number. There are typically further sub divisions of this type, namely:
- Integer which can store a whole number within a certain range,
- Float or Double which can store a decimal number within a certain range, and/or
- Long which can store a greater range of numbers.
3.7.3 String
This variable type can store a set of characters. Note that numbers can also be stored as a string, but they need to be reconverted to numbers to allow arithmetic operations. Strings are easily identifiable in R because they are single or double quoted.
a <- "12" # A string data type
b <- '13' # Another string
a + b
# Error in a + b : non-numeric argument to binary operator
When dealing with strings, the data type is normally accompanied by a host of useful functions such as:
- Determining the length of the string,
nchar()
in R - Searching for the presence of a substring,
grepl()
in R - Subsetting a string,
substr()
in R - Splitting the string with a certain delimiter,
strsplit()
in R - Replacing substrings with new substrings,
gsub()
in R - Concatenating strings,
paste()
andpaste0()
in R - Changing character casing,
tolower()
andtoupper()
in R
nchar("string") # Determining the length of the string
# [1] 6
grepl(pattern = "ri", "string") # Searching for the presence of a substring
# [1] TRUE
substr("string", start = 3, stop = 6) # Subsetting a string
# [1] "ring"
strsplit("string", split = "r") # Splitting the string with a certain delimiter
# [1] "st" "ing"
gsub(pattern = "s", replacement = "S", "string") # Replacing substrings with new substrings
# [1] "String"
paste("st","ring"); paste0("st", "ring") # Concatenating strings
[1] "st ring"
[1] "string"
tolower("StRiNg"); toupper("string") # Changing character casing
# [1] "string"
# [1] "STRING"
3.7.4 Arrays
This variable type can store a set of variables. This more sophisticated data type can take the form of:
- a one dimensional array, like a vector,
- a multi-dimensional array, like a table, and/or
- an associative array, where variables are stored in key value pairs, like names in a phonebook.
R has a host of array implementations, depending on dimensions and whether the data is homogenous/heterogenous, i.e. if all the variables in the array have the same type or not. The basic one dimensional array is called an atomic vector
, the two-dimensional version a matrix
and its n-dimensional version an array
. Vectors in R are declared through concatenation with the c()
function, or by the :
symbol, which takes the sequence of integers from the number on its left to the one on its right.
c(1,2,3,4) # Atomic Vector define by c()
# [1] 1 2 3 4
1:4 # Atomic Vector define by c()
# [1] 1 2 3 4
matrix(1:6, nrow = 2) # matrix
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 4 6
c("a" = 2, "b" = 3, "c" = 4) # named vector
# a b c
# 2 3 4
A detailed discussion of the different types of arrays in R can be found here.
R includes native support for vector/matrix operations, as per the simple numeric arithmetic examples above and more.
a <- 1:4; b <- 4:1
a + b # vectorised addition
# [1] 5 5 5 5
a - b # vectorised subtract
# -3 -1 1 3
a * b # vectorised multiply
# [1] 4 6 6 4
a/b # vectorised divide
# [1] 0.2500000 0.6666667 1.5000000 4.0000000
a %% b # vectorised modulo
# [1] 1 2 1 0
a %/% b # integer division
# [1] 0 0 1 4
a %*% b # matrix product
# [,1]
# [1,] 20
a %o% b # cartesian product
# [,1] [,2] [,3] [,4]
# [1,] 4 3 2 1
# [2,] 8 6 4 2
# [3,] 12 9 6 3
# [4,] 16 12 8 4
3.7.5 Coercion
Even though variables can be of a certain type, they can be coerced into another type.
"2" + "2"
Error in "2" + "2" : non-numeric argument to binary operator
as.numeric("2") + as.numeric("2") # now you can
# [1] 4
3.8 Logical operators
Logical operations can be performed on variables.
# define four seperate variables in one statement
a <- 3; b <- 4; y <- TRUE; n <- FALSE; d <- 1:4;
a >= b # is a greater than or equal to b
# [1] FALSE
a == b # is a equal to b
# [1] FALSE
a != b # is a not equal to b
# [1] TRUE
a < b # is a less than b
# [1] TRUE
y && n # are both y and n true
# [1] FALSE
y || n # either y or n is true
# [1] TRUE
y != n # y is not equal to b
# [1] TRUE
!n # not n
# [1] TRUE
!y # not y
# [1] FALSE
!!y # not(not y)
# [1] TRUE
1 %in% d # is contained in
# [1] TRUE
3.1 Comments
Comments are lines of code that are ignored when the code is run. Comments can be single line or span multiple lines. It is good practise to include comments that enable a user to understand the intention of the code and flag its limitations.
Different languages have different comment syntax. In R, comments cannot span more than a single line of code and are introduced by the reserved
#
symbol.In the sections that follow, comments will be used to detail the output of the sample code that precedes it.