The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
unpivotr deals with non-tabular data, especially from spreadsheets. Use unpivotr when your source data has any of these ‘features’:
If that list makes your blood boil, you’ll enjoy the function names.
behead()
deals with multi-headered hydra tables one
layer of headers at a time, working from the edge of the table inwards.
It’s a bit like using header = TRUE
in
read.csv()
, but because it’s a function, you can apply it
to as many layers of headers as you need. You end up with all the
headers in columns.spatter()
is like tidyr::spread()
but
preserves mixed data types. You get into a mixed-data-type situation by
delaying type coercion until after the table is tidy (rather
than before, like read.csv()
et al). And yes, it usually
follows behead()
.More positive, corrective functions:
justify()
aligns column headers before
behead()
ing, and has deliberate moral overtones.enhead()
attaches a header to the body of the data,
a la Frankenstein. The effect is the same as
behead()
, but is more powerful because you can choose
exactly which header cells you want, paying attention to formatting
(which behead()
doesn’t understand).isolate_sentinels()
separates meaningful symbols like
"N/A"
or "confidential"
from the rest of the
data, giving them some time alone think about what they’ve done.partition()
takes a sheet with several tables on it,
and slashes into pieces that each contain one table. You can then
unpivot each table in turn with purrr::map()
or
similar.Unpivotr uses data where each cells is represented by one row in a dataframe. Like this.
What can you do with tidy cells? The best places to start are:
Otherwise the basic idea is:
devtools::install_github("tidyverse/readr#760")
.unpivotr::tidy_html()
unpivotr::as_cells()
– this should
be a last resort, because by the time the data is in a conventional data
frame, it is often too late – formatting has been lost, and most data
types have been coerced to strings.behead()
straight away, else
dplyr::filter()
separately for the header cells and the
data cells, and then recombine with enhead()
.spatter()
so that each column has one data type.library(unpivotr)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#> ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
#> ✔ tibble 3.1.8 ✔ dplyr 1.0.10
#> ✔ tidyr 1.2.1 ✔ stringr 1.4.1
#> ✔ readr 2.1.2 ✔ forcats 0.5.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tidyr::pack() masks unpivotr::pack()
#> ✖ tidyr::unpack() masks unpivotr::unpack()
<- purpose$`up-left left-up`
x # A pivot table in a conventional data frame. Four levels of headers, in two
x #> X2 X3 X4 X5 X6 X7
#> 1 <NA> <NA> Female <NA> Male <NA>
#> 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10
#> 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000
#> 4 <NA> 25 - 44 12000 137000 9000 81000
#> 5 <NA> 45 - 64 10000 64000 7000 66000
#> 6 <NA> 65+ <NA> 18000 7000 17000
#> 7 Certificate 15 - 24 29000 161000 30000 190000
#> 8 <NA> 25 - 44 34000 179000 31000 219000
#> 9 <NA> 45 - 64 30000 210000 23000 199000
#> 10 <NA> 65+ 12000 77000 8000 107000
#> 11 Diploma 15 - 24 <NA> 14000 9000 11000
#> 12 <NA> 25 - 44 10000 66000 8000 47000
#> 13 <NA> 45 - 64 6000 68000 5000 58000
#> 14 <NA> 65+ 5000 41000 1000 34000
#> 15 No Qualification 15 - 24 10000 43000 12000 37000
#> 16 <NA> 25 - 44 11000 36000 21000 50000
#> 17 <NA> 45 - 64 19000 91000 17000 75000
#> 18 <NA> 65+ 16000 118000 9000 66000
#> 19 Postgraduate qualification 15 - 24 <NA> 6000 <NA> <NA>
#> 20 <NA> 25 - 44 5000 86000 7000 60000
#> 21 <NA> 45 - 64 6000 55000 6000 68000
#> 22 <NA> 65+ <NA> 13000 <NA> 18000
# rows and two columns.
<- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y
y#> # A tibble: 132 × 4
#> row col data_type chr
#> <int> <int> <chr> <chr>
#> 1 1 1 chr <NA>
#> 2 2 1 chr <NA>
#> 3 3 1 chr Bachelor's degree
#> 4 4 1 chr <NA>
#> 5 5 1 chr <NA>
#> 6 6 1 chr <NA>
#> 7 7 1 chr Certificate
#> 8 8 1 chr <NA>
#> 9 9 1 chr <NA>
#> 10 10 1 chr <NA>
#> # … with 122 more rows
rectify(y) # useful for reviewing the melted form as though in a spreadsheet
#> # A tibble: 22 × 7
#> `row/col` `1(A)` `2(B)` `3(C)` `4(D)` `5(E)` `6(F)`
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 <NA> <NA> Female <NA> Male <NA>
#> 2 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10
#> 3 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000
#> 4 4 <NA> 25 - 44 12000 137000 9000 81000
#> 5 5 <NA> 45 - 64 10000 64000 7000 66000
#> 6 6 <NA> 65+ <NA> 18000 7000 17000
#> 7 7 Certificate 15 - 24 29000 161000 30000 190000
#> 8 8 <NA> 25 - 44 34000 179000 31000 219000
#> 9 9 <NA> 45 - 64 30000 210000 23000 199000
#> 10 10 <NA> 65+ 12000 77000 8000 107000
#> # … with 12 more rows
%>%
y behead("up-left", "sex") %>% # Strip headers
behead("up", "life-satisfication") %>% # one
behead("left-up", "qualification") %>% # by
behead("left", "age-band") %>% # one.
select(-row, -col, -data_type, count = chr) %>% # cleanup
mutate(count = as.integer(count))
#> # A tibble: 80 × 5
#> count sex `life-satisfication` qualification `age-band`
#> <int> <chr> <chr> <chr> <chr>
#> 1 7000 Female 0 - 6 Bachelor's degree 15 - 24
#> 2 12000 Female 0 - 6 Bachelor's degree 25 - 44
#> 3 10000 Female 0 - 6 Bachelor's degree 45 - 64
#> 4 NA Female 0 - 6 Bachelor's degree 65+
#> 5 27000 Female 7 - 10 Bachelor's degree 15 - 24
#> 6 137000 Female 7 - 10 Bachelor's degree 25 - 44
#> 7 64000 Female 7 - 10 Bachelor's degree 45 - 64
#> 8 18000 Female 7 - 10 Bachelor's degree 65+
#> 9 NA Male 0 - 6 Bachelor's degree 15 - 24
#> 10 9000 Male 0 - 6 Bachelor's degree 25 - 44
#> # … with 70 more rows
Note the compass directions in the code above, which hint to
behead()
where to find the header cell for each data
cell.
"up-left"
means the header (Female
,
Male
) is positioned up and to the left of the columns of
data cells it describes."up"
means the header (0 - 6
,
7 - 10
) is positioned directly above the columns of data
cells it describes."left-up"
means the header
(Bachelor's degree
, Certificate
, etc.) is
positioned to the left and upwards of the rows of data cells it
describes."left"
means the header (15 - 24
,
25 - 44
, etc.) is positioned directly to the left of the
rows of data cells it describes.# install.packages("devtools") # If you don't already have devtools
::install_github("nacnudus/unpivotr", build_vignettes = TRUE) devtools
The version 0.4.0 release had somee breaking changes. See
NEWS.md
for details. The previous version can be installed
as follow:
::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org") devtools
unpivotr is inspired by Databaker, a collaboration between the United Kingdom Office of National Statistics and The Sensible Code Company. unpivotr.
jailbreaker attempts to extract non-tabular data from spreadsheets into tabular structures automatically via some clever algorithms. unpivotr differs by being less magic, and equipping you to express what you want to do.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.