The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
adaR is a wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++ .
It implements several auxilliary functions to work with urls:
utils::URLdecode
(~40x
speedup)More general information on URL parsing can be found in the
introductory vignette via vignette("adaR")
.
adaR
is part of a series of R packages to analyse
webtracking data:
You can install the development version of adaR from GitHub with:
# install.packages("devtools")
::install_github("gesistsa/adaR") devtools
The version on CRAN can be installed with
install.packages("adaR")
This is a basic example which shows all the returned components of a URL.
library(adaR)
ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
#> href protocol username password
#> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 password_1
#> host hostname port pathname search hash
#> 1 example.org:8080 example.org 8080 /api ?q=1 #frag
/*
* https://user:pass@example.com:1234/foo/bar?baz#quux
* | | | | ^^^^| | |
* | | | | | | | `----- hash_start
* | | | | | | `--------- search_start
* | | | | | `----------------- pathname_start
* | | | | `--------------------- port
* | | | `----------------------- host_end
* | | `---------------------------------- host_start
* | `--------------------------------------- username_end
* `--------------------------------------------- protocol_end
*/
It solves some problems of urltools with more complex urls.
::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
urltools 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#> scheme domain port
#> 1 https 40.7519848,-74.0015045,14.\n 7z <NA>
#> path
#> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> parameter fragment
#> 1 <NA> <NA>
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#> href
#> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> protocol username password host hostname port
#> 1 https: www.google.com www.google.com
#> pathname
#> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> search hash
#> 1
A “raw” url parse using ada is extremely fast (see ada-url.com) but for this to carry
over to R is tricky. The performance is still compatible with
urltools::url_parse
with the noted advantage in accuracy in
some practical circumstances.
::mark(
benchada = ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE),
urltools = urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag"),
iterations = 1, check = FALSE
)#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 ada 3.32ms 3.32ms 301. 0B 0
#> 2 urltools 566.68µs 566.68µs 1765. 0B 0
For further benchmark results, see benchmark.md
in
data_raw
.
There are four more groups of functions available to work with url parsing:
ada_get_*()
get a specific componentada_has_*()
check if a specific component is
presentada_set_*()
set a specific component from URLSada_clear_*()
remove a specific component from
URLSpublic_suffix()
extracts their top level domain from the
public suffix list,
excluding private domains.
<- c(
urls "https://subsub.sub.domain.co.uk",
"https://domain.api.gov.uk",
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
)public_suffix(urls)
#> [1] "co.uk" "gov.uk"
#> [3] "butthisispartoftheps.kawasaki.jp"
If you are wondering about the last url. The list also contains
wildcard suffixes such as *.kawasaki.jp
which need to be
matched.
The logo is created from this portrait of Ada Lovelace, a very early pioneer in Computer Science.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.