The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

adaR

R-CMD-check CRAN status CRAN Downloads Codecov test coverage ada-url Version

adaR is a wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++ .

It implements several auxilliary functions to work with urls:

More general information on URL parsing can be found in the introductory vignette via vignette("adaR").

adaR is part of a series of R packages to analyse webtracking data:

Installation

You can install the development version of adaR from GitHub with:

# install.packages("devtools")
devtools::install_github("gesistsa/adaR")

The version on CRAN can be installed with

install.packages("adaR")

Example

This is a basic example which shows all the returned components of a URL.

library(adaR)
ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
#>                                                      href
#> 1 https://user_1:password_1@example.org:8080/api?q=1#frag
#>   protocol username   password             host
#> 1   https:   user_1 password_1 example.org:8080
#>      hostname port pathname search  hash
#> 1 example.org 8080     /api   ?q=1 #frag
  /*
   * https://user:pass@example.com:1234/foo/bar?baz#quux
   *       |     |    |          | ^^^^|       |   |
   *       |     |    |          | |   |       |   `----- hash_start
   *       |     |    |          | |   |       `--------- search_start
   *       |     |    |          | |   `----------------- pathname_start
   *       |     |    |          | `--------------------- port
   *       |     |    |          `----------------------- host_end
   *       |     |    `---------------------------------- host_start
   *       |     `--------------------------------------- username_end
   *       `--------------------------------------------- protocol_end
   */

It solves some problems of urltools with more complex urls.

urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
   7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#>   scheme                            domain port
#> 1  https 40.7519848,-74.0015045,14.\n   7z <NA>
#>                                                                                 path
#> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#>   parameter fragment
#> 1      <NA>     <NA>

ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
   5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#>                                                                                                                                                                         href
#> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m   5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#>   protocol username password           host       hostname
#> 1   https:                   www.google.com www.google.com
#>   port
#> 1     
#>                                                                                                                                               pathname
#> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m   5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#>   search hash
#> 1

A “raw” url parse using ada is extremely fast (see ada-url.com) but for this to carry over to R is tricky. The performance is still compatible with urltools::url_parse with the noted advantage in accuracy in some practical circumstances.

bench::mark(
  ada = ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE),
  urltools = urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag"),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ada          2.21µs   2.46µs   384709.        0B     38.5
#> 2 urltools   101.93µs 106.44µs     9258.        0B     61.9

For further benchmark results, see benchmark.md in data_raw.

There are four more groups of functions available to work with url parsing:

Public Suffix extraction

public_suffix() extracts their top level domain from the public suffix list, excluding private domains.

urls <- c(
  "https://subsub.sub.domain.co.uk",
  "https://domain.api.gov.uk",
  "https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
)
public_suffix(urls)
#> [1] "co.uk"                           
#> [2] "gov.uk"                          
#> [3] "butthisispartoftheps.kawasaki.jp"

If you are wondering about the last url. The list also contains wildcard suffixes such as *.kawasaki.jp which need to be matched.

Acknowledgement

The logo is created from this portrait of Ada Lovelace, a very early pioneer in Computer Science.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.