The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
List comprehension is an alternative method of writing for
loops or lapply
(and similar) functions that is readable, quick, and easy to use. Many newcomers to R struggle with using the lapply
family of functions and default to using for
. However, for
loops are extremely slow in R and should be avoided. Comprehensions in the eList
package bridge the gap by allowing users to write lapply
or sapply
functions using for
loops.
This vignette describes how to use list and other comprehensions. It also explores comprehensions with multiple variables, if-else
conditions, nested comprehensions, and parallel comprehensions.
Let us start with a simple example: we have a vector of fruit names and want to create a sequence for each based on the number of characters. One common method is to write a for
statement where we loop across each fruit name, create a sequence from the numbers, and place it in a list.
x <- c("apples", "bananas", "pears")
fruit_chars <- vector("list", length(x))
for (i in seq_along(x)){
fruit_chars[[i]] <- 1:nchar(x[i])
}
fruit_chars
#> [[1]]
#> [1] 1 2 3 4 5 6
#>
#> [[2]]
#> [1] 1 2 3 4 5 6 7
#>
#> [[3]]
#> [1] 1 2 3 4 5
Alternatively, we could use a faster, vectorized function such as lapply
. This is similar to the for
loop, except that a specified function is applied to each fruit name and the results are wrapped in a list.
The eList
package allows for a hybrid variant of the two using the List()
function. The for
loop evaluates the expression and merges the results into a list. Beneath the scenes, List
is parsing the statement into lapply
and evaluating it. As we will see, though, there are additional features in List
.
library(eList)
#>
#> Attaching package: 'eList'
#> The following object is masked from 'package:utils':
#>
#> zip
fruit_chars <- List(for (i in x) 1:nchar(i))
If you are coming from Python or another language that uses list comprehension, you will notice that this syntax is a bit different; it is written more similar to standard for
loops in R. The for
statement comes first, then the variable name and sequence are placed in parentheses. Finally the expression is placed afterwards. The expression can be wrapped in braces {}
if desired.
The lists of numbers in the previous examples may be confusing. Luckily, List
and other comprehension functions allow us to assign names to each result using the notation: name = expr
.
List(for (i in x) i = 1:nchar(i))
#> $apples
#> [1] 1 2 3 4 5 6
#>
#> $bananas
#> [1] 1 2 3 4 5 6 7
#>
#> $pears
#> [1] 1 2 3 4 5
The name can be any type of expression. Though complex expressions should be wrapped in parentheses so that the parser does not confuse who has the =
sign.
The results of a comprehension can be filtered using standard if
statements after naming the variables and sequence, making sure that the condition is placed in parentheses. The statement below only returns the sequence for "bananas"
.
Any statement that evaluates to NULL
is automatically filtered from the results. else
statements can also be included so that the results are not filtered out (unless the else
statement evaluates to NULL
).
List(for (i in x) if (i == "bananas") "delicious" else "ewww")
#> [[1]]
#> [1] "ewww"
#>
#> [[2]]
#> [1] "delicious"
#>
#> [[3]]
#> [1] "ewww"
Each entry can still be assigned a name. Furthermore, the expression can be as complex as necessary for the task with any number of else if
checks.
Comprehensions can have multiple variables if the variables are separated using "."
within a single name. To see how this works, let us use the enum
function in the eList
package. When enum
is used on a variable, the first value becomes its index in the loop and the second is the value of the vector at that index. Now, when we use (i.j in enum(x))
, i =
the index number of each item in x
, while j =
the value of the item in x
(the name of the fruit).
Let us inspect enum(x)
to see what is going on beneath the surface. enum
took the vector x
and created a new list at each index. The first list contains two elements: the first being 1
(the index number), the second being "apples"
(the first value in x
).
enum(x)
#> [[1]]
#> [[1]][[1]]
#> [1] 1
#>
#> [[1]][[2]]
#> [1] "apples"
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 2
#>
#> [[2]][[2]]
#> [1] "bananas"
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 3
#>
#> [[3]][[2]]
#> [1] "pears"
When multiple variables like i.j
are passed to the for
loop, then i
is assigned to the first element and j
to the second element for item in the sequence. There can be any number of variables as long as they are separated by "."
. There does not have to be a variable for each element; but there does need to be an element for each item or else an “out-of-bounds” error will be produced.
y <- list(a = 1:3, b = 4:6, c = 7:9)
List(for (i.j.k in y) (i+j)/k)
#> $a
#> [1] 1
#>
#> $b
#> [1] 1.5
#>
#> $c
#> [1] 1.666667
Similarly, variables can be skipped. The function below extracts the first and third elements of each item in the list.
If only the first or second item is needed, though, you should use two dots: i..
for the first, ..i
for the second.
Another convenient function that can be used on the sequence is items()
. It is similar to enum
, except that the name of each item is used instead of its index number. See the documentation for eList
for other helper functions.
List(for (i.j in items(y)) paste0(i, j))
#> [[1]]
#> [1] "a1" "a2" "a3"
#>
#> [[2]]
#> [1] "b4" "b5" "b6"
#>
#> [[3]]
#> [1] "c7" "c8" "c9"
Variables can also be separated using a comma as long as the variables are surrounded in backticks.
All of the examples so far have returned a list. However, eList
supports a variety of different types of comprehension. For example, Num
returns a numeric vector, while Chr
returns a character vector. Env
can be used to produce an environment, similar to dictionary comprehensions in Python. Note that each entry in an environment must have a unique name. These work by coercing the result into particular type, producing an error if unable.
Num(for (i.j.k in y) (i+j)/k)
#> a b c
#> 1.000000 1.500000 1.666667
Chr(for (i.j.k in y) (i+j)/k)
#> a b c
#> "1" "1.5" "1.66666666666667"
One convenience function is ..
. It can be used as either ..[for ...]
or ..(for ...)
. By default, it attempts to simplify the results in an array, but can mimic any other type of comprehension.
Multiple for
loops can be used within a comprehension. As long the subsequent for
statements immediately follow the variables & sequence of the previous one, or immediately follow a if-else
statement, then it will be parsed into a vectorized lapply
style function. Nested loops should be avoided unless necessary since they can be difficult to understand. The following are a couple examples using matrix and numeric comprehension.
Mat(for (i in 1:3) for (j in 1:6) i*j)
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 2 4 6
#> [3,] 3 6 9
#> [4,] 4 8 12
#> [5,] 5 10 15
#> [6,] 6 12 18
Vector Comprehensions can also be nested within each other. The code below nests a numeric comprehension within a character comprehension.
Vector comprehensions also for parallel computations using the parallel
package. All comprehensions allow the user to supply a cluster, which activates the parallel version of sapply
or lapply
. eList
comes with a function that allows for the quick creation of a cluster based on the user’s operating system and by auto-detecting the number of cores available. Users are recommended to explicitly create a cluster though the functions in the parallel
package unless a quick parallel calculation is needed.
cluster <- auto_cluster(2)
Num(for (i in 1:100) sample(1:100, 1), clust=cluster)
close_cluster(cluster)
Note that the additional overhead involved with parallelization means that the gain in performance will be relatively small (and often negative) unless the sequence is sufficiently large. Large comprehensions, though, can experience significant gains by specifying a cluster.
The eList
package contains a number of summary functions that accept vector comprehensions and/or normal vectors. These functions allow users to combine multiple comprehensions and other vectors in a single function, then apply a summary function to all entries. Each summary comprehension accepts a cluster for parallel computations and the na.rm
argument. Some examples:
# Are no values TRUE? (combines comprehension with other values)
None(for (i in 1:10) i < 0, TRUE, FALSE)
#> [1] FALSE
# Summary statistics from a random draw of 1000 observations from normal distribution
Stats(for (i in rnorm(1000)) i)
#> $min
#> [1] -2.975354
#>
#> $q1
#> [1] -0.6803642
#>
#> $med
#> [1] 0.09313939
#>
#> $q3
#> [1] 0.809412
#>
#> $max
#> [1] 3.731964
#>
#> $mean
#> [1] 0.06918359
#>
#> $sd
#> [1] 1.04056
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.