The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This is a very brief place-holder example (not executed, as
SQLite
does not have the needed window functions). For more
details please see the fuller note: rquery README
.
The primary purpose of rquery
is SQL
query
generation. We demonstrate this below.
Note: in examples we use rq_copy_to()
to create data.
This is only for the purpose of having easy portable examples. With big
data the data is usually already in the remote database or Spark system.
The task is almost always to connect and work with this pre-existing
remote data and the method to do this is db_td()
,
which builds a reference to a remote table given the table name.
# produce a hande to existing table
d <- db_td(my_db, "d")
scale <- 0.237
dq <- d %.>%
extend(.,
one = 1) %.>%
extend(.,
probability :=
exp(assessmentTotal * scale)/
sum(exp(assessmentTotal * scale)),
count := sum(one),
partitionby = 'subjectID') %.>%
extend(.,
rank := cumsum(one),
partitionby = 'subjectID',
orderby = c('probability', 'surveyCategory')) %.>%
rename_columns(., 'diagnosis' := 'surveyCategory') %.>%
select_rows(., rank == count) %.>%
select_columns(., c('subjectID',
'diagnosis',
'probability')) %.>%
orderby(., 'subjectID')
class(my_db)
#> [1] "SQLiteConnection"
#> attr(,"package")
#> [1] "RSQLite"
Presentation format (see also op_diagram()
):
mk_td("d", c(
"subjectID",
"surveyCategory",
"assessmentTotal",
"irrelevantCol1",
"irrelevantCol2")) %.>%
extend(.,
one := 1) %.>%
extend(.,
probability := exp(assessmentTotal * 0.237) / sum(exp(assessmentTotal * 0.237)),
count := sum(one),
partitionby = c('subjectID'),
orderby = c(),
reverse = c()) %.>%
extend(.,
rank := cumsum(one),
partitionby = c('subjectID'),
orderby = c('probability', 'surveyCategory'),
reverse = c()) %.>%
rename_columns(.,
c('diagnosis' = 'surveyCategory')) %.>%
select_rows(.,
rank == count) %.>%
select_columns(.,
c('subjectID', 'diagnosis', 'probability')) %.>%
order_rows(.,
c('subjectID'),
reverse = c(),
limit = NULL)
to_sql()
SQL (see also materialize()
):
SELECT * FROM (
SELECT
`subjectID`,
`diagnosis`,
`probability`
FROM (
SELECT * FROM (
SELECT
`subjectID` AS `subjectID`,
`surveyCategory` AS `diagnosis`,
`probability` AS `probability`,
`count` AS `count`,
`rank` AS `rank`
FROM (
SELECT
`subjectID`,
`surveyCategory`,
`probability`,
`count`,
SUM ( `one` ) OVER ( PARTITION BY `subjectID` ORDER BY `probability`, `surveyCategory` ) AS `rank`
FROM (
SELECT
`subjectID`,
`surveyCategory`,
`one`,
exp ( `assessmentTotal` * 0.237 ) / sum ( exp ( `assessmentTotal` * 0.237 ) ) OVER ( PARTITION BY `subjectID` ) AS `probability`,
sum ( `one` ) OVER ( PARTITION BY `subjectID` ) AS `count`
FROM (
SELECT
`subjectID`,
`surveyCategory`,
`assessmentTotal`,
1 AS `one`
FROM (
SELECT
`subjectID`,
`surveyCategory`,
`assessmentTotal`
FROM
`d` LIMIT 1000
) tsql_80987133311344751488_0000000000
) tsql_80987133311344751488_0000000001
) tsql_80987133311344751488_0000000002
) tsql_80987133311344751488_0000000003
) tsql_80987133311344751488_0000000004
WHERE `rank` = `count`
) tsql_80987133311344751488_0000000005
) tsql_80987133311344751488_0000000006 ORDER BY `subjectID`
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.