Basics of R

The only secret to learn R well is to take good advantage of online resources. For example, you can easily find a bunch of online R tutorials, such as Quick-R or R tutorial. You can follow the chapters of these tutorials to get to know the variables and functions of R and how to ensemble the pieces of R units as your own functions to fulfill your goals. Also, taking good advantage of Google search will benefit your learning with R. Just remember to google anything which don’t know when you are using R and always learn from (or imitate) the codes of other people’s.

Variables

In R or other computer languages, a variable corresponds to a sector of RAM used for storing data. For the ease to recognize and memorize a variable, normally we will give a meaningful name to this variable. The below codes show how we declare variables x and y and assign a value to each of them. You can type these two lines in the script pane. If you wan to run the codes, you can highlight the codes you want to run or move your mouse cursor to the line which you want to run and then press ctrl+Enter. You will see two variables x and y already in the Environment pane. Also, you will see the codes appearing in the console pane.

x<-10
y<-"Hello"

Now, x is a variable or a name of a variable, which stores a scalar 10. The symbol <- (the compound of < and -) means to assign the value on the right to the variable on the left. Similarly, y is a variable assigned by the string “Hello”. You can use the function mode( ) to check the types of these two variables. The mode of x is numeric. That means x can be summed with, subtracted, timed, or divided by another number. However, y as a string is not a number.

mode(x)
## [1] "numeric"
mode(y)
## [1] "character"

In addition to assigning directly a value to a variable, we can also create a new variable by transferring the name of an exited variable to that new variable. For example, the below codes create a new variable a which also stores the value 10. In fact, the variable name can be imagined as a label stamped on the data. That is, the value 10 occupies a certain sector of RAM, no matter whether it has a name or not. When we give it a name x, it is just like that we stamp a label x on the data 10. Thereafter, we can retrieve the value 10 by calling it by the variable name. Similarly, when we run a<-x, it is like we stamp a second label on the value 10, which is a. Accordingly, no matter which name you call (e.g., x or a), the same value 10 will be retrieved by R.

a<-x
a
## [1] 10
b<-y
b
## [1] "Hello"

Vector and Matrix

Different from other low-level computer language, it is very easy to process vectors and matrices in R. For example, the first line in the blow codes declares a variable a which is a numeric vector. Now a is not 10 but a vector of 5 numbers. Note a data can have many stamps (or labels), but a label can only point to the current data. Therefore, when a new value is assigned to an exited variable, the variable label is not stamped on the previous value any more. In the below codes, c( ) is a function in the {base} package, which is used to combine values into a vector. The modes of a vector also are twofold: numeric and character. Note in R, a data of character mode should be marked by ” ” at all times.

a<-c(1,2,4,8,10)
a
## [1]  1  2  4  8 10
# ?c  Type ?c in the console pane, you can see the help file about the function called c
b<-c("John","Mary","Apple","Smith","David")
b
## [1] "John"  "Mary"  "Apple" "Smith" "David"

A numeric vector is just like a numeric value which you can do arithmetic calculations with it. See the results after you execute the following codes.

a+10
## [1] 11 12 14 18 20
a-10
## [1] -9 -8 -6 -2  0
a*2
## [1]  2  4  8 16 20
a/2
## [1] 0.5 1.0 2.0 4.0 5.0

A matrix is composed of at least two column (or row) vectors. For example, we can declare a matrix by matrix( ). If you don’t know the usage of this function, you can type ?matrix in the console pane to check the help file. The below codes declare a variable called v, which is a \(2 \times 3\) matrix containing the values of 1, 2, 3, 4, 5, and 6. Of course, you can transpose a matrix by using the function t( ).

v<-matrix(c(1,2,3,4,5,6),2,3)
v
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
t(v)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6

Of course, you can also directly declare a \(3\times2\) matrix. Try to run the below codes and see if you understand the useage of matrix( ).

v1<-matrix(c(1,2,3,4,5,6),3,2)
v2<-matrix(c(1,2,3,4,5,6),3,2,byrow=T)
v1
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
v2
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6

You can retrieve the value in a particular cell of a matrix by calling the coordinates of it. For example, v1[2,1] means to retrieve the value stored in cell of the second row and the first column. Similarly, v1[3,2] should be 6.

v1[2,1]
## [1] 2
v1[3,2]
## [1] 6

If you want to know the length of a vector, you can use the function length( ). Try to check the length of the matrix v. You will see that the length of the matrix v is 6. Although a matrix is a two-dimensional data structure, in R, it is still a one-dimensional vector. The values of a matrix are sorted in the order following the rows and then the columns.

length(a)
## [1] 5
length(v)
## [1] 6
v1[4]
## [1] 4
v2[3]
## [1] 5

As shown in the previous demonstrations, you might notice the regulation of R. For a variable, there are no parentheses ( ), whereas for a function there must be ( ) following the function name. For example, rep( ) is used to repeat the values. Check the below codes to understand the useages of this function.

rep(1,3)
## [1] 1 1 1
rep(c(1,2),3)
## [1] 1 2 1 2 1 2
rep(c(1,2),each=3)
## [1] 1 1 1 2 2 2

A similar function to rep( ) is seq( ). Again, you can look for its usage by typing ?seq in the console pane. The below codes create two series of numbers. One starts at 1 and ends at the value \(\leq\) 10 with the interval as 2. The other starts at 2 and ends at the largest value not exceeding 10. Also, the interval can be negative, hence creating a descending vector.

seq(1,10,2)
## [1] 1 3 5 7 9
seq(2,10,2)
## [1]  2  4  6  8 10
seq(10,1,-1)
##  [1] 10  9  8  7  6  5  4  3  2  1

Suppose you want to declare a series from 1 to 100. What will you do? Of course, you can call seq(1,100). Note if you don’t specify the size of the interval, it will be +1 by default. Another easier way to create a series from 1 t0 100 is using this particular operator :, which means colonize +1. This operator is very useful especially when you are creating a loop.

1:100
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100
10:1
##  [1] 10  9  8  7  6  5  4  3  2  1

Matrix Arithmetic

Since a matrix is a two-dimensional data structure, a vector can be viewed as a one-dimensional matrix. Due to the fact that dimensionality should be taken into consideration when dealing with matrices and vectors, the arithmetic for matrices is quite different from that for scalars. Now declare two matrices, each containing numbers from 1 to 6. See the below codes. You can try to add these two matrices and see what result you will get. Not only can’t you add these two matrices, but also you can’t subtract one from the other. This is because they have different numbers of rows and columns. If you want to add two matrices, you must make sure they have the same rows and columns. Thus, you can transpose one of them, say y, first and then you can add them two. Similarly, you need to make two matrices comparable before you do subtraction.

x<-matrix(1:6,2,3)
y<-matrix(1:6,3,2)
# x+y is invalid due to the fact that x and y are non-conformable matrices
x+t(y)
##      [,1] [,2] [,3]
## [1,]    2    5    8
## [2,]    6    9   12
t(x)-y
##      [,1] [,2]
## [1,]    0   -2
## [2,]    1   -1
## [3,]    2    0

The multiplication of matrices is a bit different. In the below codes, two matrices are declared. A matrix can be multiplied with a scalar by directly using the multiplication operator *. For example, a * 2. You will see all elements in this matrix are timed by 2. Also, you can divide every element of a matrix by calling the matrix name followed by the division operator / and the dividend. For example, you can try to run b/2 in the consol pane and see what you will get.

a<-matrix(1,2,2)
b<-matrix(1,2,3)
a
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    1
b
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
a*2
##      [,1] [,2]
## [1,]    2    2
## [2,]    2    2
b/2
##      [,1] [,2] [,3]
## [1,]  0.5  0.5  0.5
## [2,]  0.5  0.5  0.5

In addition to timing/dividing a matrix by a scalar, the element-wise multiplication and division can also be done in the same way, as long as the two matrices are comparable. See the below codes and the results after executing them. They are exactly the same as the original matrices. This is because all elements of these two matrices are 1’s. Note the element-wise operation cannot be run if the two matrices are non-comparable.

a*a
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    1
b/b
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
# a*b will cause an error and so will a/b. You can try yourself.

Different from the element-wise operation, the matrix multiplication normally refers to a two-step operation. First, timing the elements of the row of one matrix by the elements of the column of the other in a one-to-one manner. Second, summing them together as a scalar in the new matrix. See the below codes. The operator %*% means the matrix multiplication. The outcome will be a new matrix which has as many rows as the first matrix and as many columns as the second matrix. With this multiplication, you don’t have to have the two matrices have the same numbers of rows and columns. However, you have to make sure the first matrix has as many columns as the rows of the second matrix.

a%*%b
##      [,1] [,2] [,3]
## [1,]    2    2    2
## [2,]    2    2    2
# Try b%*%a. You will get an error.
t(b)%*%a
##      [,1] [,2]
## [1,]    2    2
## [2,]    2    2
## [3,]    2    2

How about the matrix division? It is not as straightforward as the scalar division. Before we introduce matrix division, you need to know what is identity matrix. An identity matrix is a \(n \times n\) matrix (or square matrix) with 1 as the elements in the diagonal and 0 as the elements in other cells. For example, you can use diag( ) to declare an identity matrix. A matrix \(A\) timed by an identity matrix \(I\) equals to itself, \(AI=A\).

diag(3)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
c<-matrix(1:9,3,3)
c
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
I<-diag(3)
c%*%I
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

How to do matrix division? In algebra, if \(a*1=a\), then \(1=a/a\). We call \(1/a\) as the inverse of \(a\). In linear algebra, the inverse matrix of \(A\) is called \(A^{-1}\), \(AA^{-1}=I\). Instead of doing matrix division, we solve the equation \(Ax=B\) for \(x\) by multiplying both sides by the inverse of matrix \(A\). \(A^{-1}Ax=A^{-1}B \Rightarrow x=A^{-1}B\). The below codes show how to sole the equation \(Ix=I\) for \(x\). Clearly, you will see the inverse of an identity matrix is still itself. Suppose we want to get the inverse of a matrix, we can use the function solve( ). \(AB=I\) means that \(A\) and \(B\) are the inverse matrix to each other. Note not every matrix has its inverse. For a matrix to have an inverse matrix, at least it has to be a square matrix.

solve(I,I)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
A<-matrix(c(2,5,3,7),2,2)
A
##      [,1] [,2]
## [1,]    2    3
## [2,]    5    7
B<-solve(A,diag(2))
B
##      [,1] [,2]
## [1,]   -7    3
## [2,]    5   -2
solve(B,diag(2))
##      [,1] [,2]
## [1,]    2    3
## [2,]    5    7