Loop is a function to repeat the processes for a determined rounds. For example, what will we get if we continuously add 1 to the accumulated sum from the start value = 0 for 10 rounds. There are several ways to solve this problem. Of course, you can repeat the same code for +1 for 10 times. You will get the result value as 10.
v<-0
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v<-v+1
v
## [1] 10
However, this is not a good solution. It is okay to repeat the command v<-v+1 for 10 times, what if you want to repeat for 100 times? Obviously, this must not be the correct way to go. For this particular request, we can take advantage of the functions of R, such as rep( ). See the following code. Although this is quite neat, it is not general enough. In fact, this code can only be used to solve this problem.
sum(rep(1,10))
## [1] 10
For example, if we want to compute \(1+2+\dots+10\), what can we do. A simple way to go is to call sum(1:10). Alternatively, we can use cumsum( ) and report the final value.
sum(1:10)
## [1] 55
cumsum(1:10)
## [1] 1 3 6 10 15 21 28 36 45 55
Of course, we can use a for loop to solve this problem. A for loop starts at the keyword for followed by ( ) and { }. In the two parentheses is a numeric sequence, which is used to label the round. For example, the command i in 1:10 means a variable i varying from 1 to 10. The commands in between { and } are the processes that we want to repeat. Here, the process is to update the value of the variable v by adding one to it.
v<-0
for(i in 1:10){
v<-v+i
}
v
## [1] 55
If the process to be repeated can be written in one line, then we can write the commands right following for( ).
v<-0
for(i in 1:10)v<-v+i
v
## [1] 55
More than doing arithmetic, loop can be used in many other occasions. The next example extracts the first, third, and fifth elements in the vector sentences. Of course, you can directly call these elements by their positions in the sentences.
sentences<-c("This","is","a","wonderful","book")
for(i in seq(1,5,2))print(sentences[i])
## [1] "This"
## [1] "a"
## [1] "book"
sentences[c(1,3,5)]
## [1] "This" "a" "book"
In R, running the for loop is relatively time consuming. There are other functions which are faster than the for loop but can do the same thing in most cases. For example, apply( ) is a function particularly useful when you want to repeat a process by column or by row in a matrix or a data frame. For example, we have a data frame recording five students’ scores in English and Mathematics tests. We can compute the mean of these five students’ scores for each test by calling apply( ) with the second argument set up as 2, meaning that the function in the first argument will be applied to the data of each column. Similarly, when the second argument is set up as 1, the function in the third argument will be applied to the data in each row. That’s the mean score across the two tests for each student.
dd<-data.frame(eng=c(100,78,34,67,89),
math=c(89,76,98,37,20))
dd
## eng math
## 1 100 89
## 2 78 76
## 3 34 98
## 4 67 37
## 5 89 20
apply(dd,2,mean)
## eng math
## 73.6 64.0
apply(dd,1,mean)
## [1] 94.5 77.0 66.0 52.0 54.5
Of course, if you only want to compute the means or sums by row or by column, you have other options. For example, colSums( ), colMeans( ), rowSums( ), and rowMeans( ).
colMeans(dd)
## eng math
## 73.6 64.0
rowMeans(dd)
## [1] 94.5 77.0 66.0 52.0 54.5
Other functions can be used in apply( ). For example, we can compute the standard deviations of English and Mathematics scores.
apply(dd,2,sd)
## eng math
## 25.32390 33.87477
Similar to apply( ), tapply( ) is another common function in the family of apply( ). For illustrating how to use tapply( ), we can add the genders of the five students into the same data frame. Now if we want to check the mean scores in the English test of the two genders, we can use tapply( ). Again, we can use the same way to compute the mean scores in the Mathematics test. Note you can use with( ) to locate the working environment.
dd$sex<-c("m","f","f","m","m")
tapply(dd$eng,dd$sex,mean)
## f m
## 56.00000 85.33333
with(dd,tapply(math,sex,mean))
## f m
## 87.00000 48.66667
In addition to mean( ), other functions can be used in tapply( ). Although a bit unnecessary, we can check the length of the scores of each gender. Alternatively, we can compute the standard deviation of the scores.
with(dd,tapply(math,sex,length))
## f m
## 2 3
with(dd,tapply(eng,sex,sd))
## f m
## 31.11270 16.80278
In tapply( ), the second argument is an index variable, which can be more than one variable. For example, we can check the mean scores of different genders in different tests. To this end, we need to change a bit the format of this data frame dd. Now we can compute the mean score of each gender in each test.
dd1<-data.frame(score=c(dd$eng,dd$math),gender=c(dd$sex,dd$sex),
topic=rep(c("eng","math"),each=5))
dd1
## score gender topic
## 1 100 m eng
## 2 78 f eng
## 3 34 f eng
## 4 67 m eng
## 5 89 m eng
## 6 89 m math
## 7 76 f math
## 8 98 f math
## 9 37 m math
## 10 20 m math
with(dd1,tapply(score,list(gender,topic),mean))
## eng math
## f 56.00000 87.00000
## m 85.33333 48.66667
The function sapply( ) is also a member in the family of apply( ). It is very similar to the for loop, but actually it can be much faster than the for loop. See the code below, in which the first argument is a vector and the second one is a user-defined function with x as the argument. Here the variable x actually substitutes the elements in the vector in the first argument. Following the function is the process, which is x+10 here. When you run this code, you will get a sequence from 11 to 20. This is because x varies from 1 to 10 and sapply( ) will return the outcome of x+10.
sapply(1:10,function(x)x+10)
## [1] 11 12 13 14 15 16 17 18 19 20
Note the difference between the for loop and apply functions is whether or not the data will be changed in the repetitions of the process. If the data will be updated in the repetition of the process, then apply functions cannot be used. Check the below codes. The first is to use sapply( ) to upate the value of the variable temp. However, temp is still 0 in the end. However, the second demonstration is to use a for loop to update the value of the variable v. It is clear that the value is changed from 0 to 55, which is the sum of the sequence of 1:10.
temp<-0
sapply(1:10,function(x){
temp<-temp+x
return(temp)
})
## [1] 1 2 3 4 5 6 7 8 9 10
temp
## [1] 0
v<-0
for(i in 1:10){
v<-v+i
}
v
## [1] 55
A loop can be embedded in another loop. Suppose we have 8 cities on a x-y space. We would like to know the distance between every two cities, including the distance to themselves. We can use two for loops to do this calculation. Firstly, we create the coordinates of these 8 cities. Secondly, we compute the distance between every two cities. Also, you can use sapply( ) to get the same result.
cities<-data.frame(y=c(0.4,0.6,0.2,0.8,0.2,0.8,0.4,0.6),
x=c(0.2,0.2,0.4,0.4,0.6,0.6,0.8,0.8))
library(ggplot2)
ggplot(cities,aes(x,y))+
geom_point()
dist<-matrix(-1,8,8)
for(j in 1:8){
for(i in 1:8){
dist[i,j]<-sum(abs(cities[i,]-cities[j,]))
}
}
dist
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0.0 0.2 0.4 0.6 0.6 0.8 0.6 0.8
## [2,] 0.2 0.0 0.6 0.4 0.8 0.6 0.8 0.6
## [3,] 0.4 0.6 0.0 0.6 0.2 0.8 0.6 0.8
## [4,] 0.6 0.4 0.6 0.0 0.8 0.2 0.8 0.6
## [5,] 0.6 0.8 0.2 0.8 0.0 0.6 0.4 0.6
## [6,] 0.8 0.6 0.8 0.2 0.6 0.0 0.6 0.4
## [7,] 0.6 0.8 0.6 0.8 0.4 0.6 0.0 0.2
## [8,] 0.8 0.6 0.8 0.6 0.6 0.4 0.2 0.0
Dist<-sapply(1:8,function(j){
p<-sapply(1:8,function(i){
return(sum(abs(cities[i,]-cities[j,])))
})
return(p)
})
Dist
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0.0 0.2 0.4 0.6 0.6 0.8 0.6 0.8
## [2,] 0.2 0.0 0.6 0.4 0.8 0.6 0.8 0.6
## [3,] 0.4 0.6 0.0 0.6 0.2 0.8 0.6 0.8
## [4,] 0.6 0.4 0.6 0.0 0.8 0.2 0.8 0.6
## [5,] 0.6 0.8 0.2 0.8 0.0 0.6 0.4 0.6
## [6,] 0.8 0.6 0.8 0.2 0.6 0.0 0.6 0.4
## [7,] 0.6 0.8 0.6 0.8 0.4 0.6 0.0 0.2
## [8,] 0.8 0.6 0.8 0.6 0.6 0.4 0.2 0.0