The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
queryparser translates SQL queries into lists of unevaluated R expressions.
⚠️ Most R users should not directly use queryparser. Instead, use it through tidyquery. |
---|
For an introduction to tidyquery and queryparser, watch the recording of the talk “Bridging the Gap between SQL and R” from rstudio::conf(2020).
Install the released version of queryparser from CRAN with:
install.packages("queryparser")
Or install the development version from GitHub with:
# install.packages("remotes")
::install_github("ianmcook/queryparser") remotes
Call the function parse_query()
, passing a
SELECT
statement enclosed in quotes as the first
argument:
library(queryparser)
parse_query("SELECT DISTINCT carrier FROM flights WHERE dest = 'HNL'")
#> $select
#> $select[[1]]
#> carrier
#>
#> attr(,"distinct")
#> [1] TRUE
#>
#> $from
#> $from[[1]]
#> flights
#>
#>
#> $where
#> $where[[1]]
#> dest == "HNL"
Queries can include the clauses SELECT
,
FROM
, WHERE
, GROUP BY
,
HAVING
, ORDER BY
, and LIMIT
:
parse_query(
" SELECT origin, dest,
COUNT(flight) AS num_flts,
round(SUM(seats)) AS num_seats,
round(AVG(arr_delay)) AS avg_delay
FROM flights f LEFT OUTER JOIN planes p
ON f.tailnum = p.tailnum
WHERE distance BETWEEN 200 AND 300
AND air_time IS NOT NULL
GROUP BY origin, dest
HAVING num_flts > 3000
ORDER BY num_seats DESC, avg_delay ASC
LIMIT 2;"
)#> $select
#> $select[[1]]
#> origin
#>
#> $select[[2]]
#> dest
#>
#> $select$num_flts
#> sum(!is.na(flight), na.rm = TRUE)
#>
#> $select$num_seats
#> round(sum(seats, na.rm = TRUE))
#>
#> $select$avg_delay
#> round(mean(arr_delay, na.rm = TRUE))
#>
#> attr(,"aggregate")
#> num_flts num_seats avg_delay
#> FALSE FALSE TRUE TRUE TRUE
#>
#> $from
#> $from$f
#> flights
#>
#> $from$p
#> planes
#>
#> attr(,"join_types")
#> [1] "left outer join"
#> attr(,"join_conditions")
#> attr(,"join_conditions")[[1]]
#> f.tailnum == p.tailnum
#>
#>
#> $where
#> $where[[1]]
#> (distance >= 200 & distance <= 300) & !is.na(air_time)
#>
#>
#> $group_by
#> $group_by[[1]]
#> origin
#>
#> $group_by[[2]]
#> dest
#>
#>
#> $having
#> $having[[1]]
#> num_flts > 3000
#>
#>
#> $order_by
#> $order_by[[1]]
#> -xtfrm(num_seats)
#>
#> $order_by[[2]]
#> avg_delay
#>
#> attr(,"aggregate")
#> [1] FALSE FALSE
#>
#> $limit
#> $limit[[1]]
#> [1] 2
#>
#>
#> attr(,"aggregate")
#> [1] TRUE
Set the argument tidyverse
to TRUE
to use
functions from tidyverse
packages including dplyr, stringr, and lubridate in the R
expressions:
parse_query("SELECT COUNT(*) AS n FROM t WHERE x BETWEEN y AND z ORDER BY n DESC", tidyverse = TRUE)
#> $select
#> $select$n
#> dplyr::n()
#>
#> attr(,"aggregate")
#> n
#> TRUE
#>
#> $from
#> $from[[1]]
#> t
#>
#>
#> $where
#> $where[[1]]
#> dplyr::between(x, y, z)
#>
#>
#> $order_by
#> $order_by[[1]]
#> dplyr::desc(n)
#>
#> attr(,"aggregate")
#> [1] FALSE
#>
#> attr(,"aggregate")
#> [1] TRUE
queryparser will translate only explicitly allowed functions and operators, preventing injection of malicious code:
parse_query("SELECT x FROM y WHERE system('rm -rf /')")
#> Error: Unrecognized function or operator: system
queryparser does not currently support:
WITH
clause (common table expressions)OVER
expressions (window or analytic functions)queryparser currently has the following known limitations:
tidyverse
is set to TRUE
. An example of this is
COUNT(DISTINCT )
expressions with multiple arguments.IS NULL
) have
unparenthesized expressions as their operands, R will interpret the
resulting code using a different order of operations than a SQL engine
would. When using an expression as the operand to a logical operator,
always enclose the expression in parentheses.tidyverse
is set to TRUE
, SQL
expressions that use CASE
or coalesce()
with
NULL
s in the arguments can return expressions that throw
data type errors when evaluated. This is because NULL
translates to NA
, which is by default a logical constant
(not a numeric, integer, or character constant). To work around this,
cast NULL
to the expected data type in the SQL
expression.queryparser is not intended to:
INSERT
or UPDATE
)SELECT
statements
passed to itThe sqlparseR package (CRAN) provides a wrapper around the Python module sqlparse.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.