The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

unpivotr

Travis-CI Build Status AppVeyor Build Status Cran Status Cran Downloads codecov R-CMD-check

unpivotr deals with non-tabular data, especially from spreadsheets. Use unpivotr when your source data has any of these ‘features’:

If that list makes your blood boil, you’ll enjoy the function names.

More positive, corrective functions:

Make cells tidy

Unpivotr uses data where each cells is represented by one row in a dataframe. Like this.

Gif of tidyxl converting cells into a tidy representation of one row per cell

What can you do with tidy cells? The best places to start are:

Otherwise the basic idea is:

  1. Read the data with a specialist tool.
  2. Either behead() straight away, else dplyr::filter() separately for the header cells and the data cells, and then recombine with enhead().
  3. spatter() so that each column has one data type.
library(unpivotr)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#> ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
#> ✔ tibble  3.1.8      ✔ dplyr   1.0.10
#> ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
#> ✔ readr   2.1.2      ✔ forcats 0.5.2 
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ tidyr::pack()   masks unpivotr::pack()
#> ✖ tidyr::unpack() masks unpivotr::unpack()
x <- purpose$`up-left left-up`
x # A pivot table in a conventional data frame.  Four levels of headers, in two
#>                            X2      X3     X4     X5    X6     X7
#> 1                        <NA>    <NA> Female   <NA>  Male   <NA>
#> 2                        <NA>    <NA>  0 - 6 7 - 10 0 - 6 7 - 10
#> 3           Bachelor's degree 15 - 24   7000  27000  <NA>  13000
#> 4                        <NA> 25 - 44  12000 137000  9000  81000
#> 5                        <NA> 45 - 64  10000  64000  7000  66000
#> 6                        <NA>     65+   <NA>  18000  7000  17000
#> 7                 Certificate 15 - 24  29000 161000 30000 190000
#> 8                        <NA> 25 - 44  34000 179000 31000 219000
#> 9                        <NA> 45 - 64  30000 210000 23000 199000
#> 10                       <NA>     65+  12000  77000  8000 107000
#> 11                    Diploma 15 - 24   <NA>  14000  9000  11000
#> 12                       <NA> 25 - 44  10000  66000  8000  47000
#> 13                       <NA> 45 - 64   6000  68000  5000  58000
#> 14                       <NA>     65+   5000  41000  1000  34000
#> 15           No Qualification 15 - 24  10000  43000 12000  37000
#> 16                       <NA> 25 - 44  11000  36000 21000  50000
#> 17                       <NA> 45 - 64  19000  91000 17000  75000
#> 18                       <NA>     65+  16000 118000  9000  66000
#> 19 Postgraduate qualification 15 - 24   <NA>   6000  <NA>   <NA>
#> 20                       <NA> 25 - 44   5000  86000  7000  60000
#> 21                       <NA> 45 - 64   6000  55000  6000  68000
#> 22                       <NA>     65+   <NA>  13000  <NA>  18000
  # rows and two columns.

y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y
#> # A tibble: 132 × 4
#>      row   col data_type chr              
#>    <int> <int> <chr>     <chr>            
#>  1     1     1 chr       <NA>             
#>  2     2     1 chr       <NA>             
#>  3     3     1 chr       Bachelor's degree
#>  4     4     1 chr       <NA>             
#>  5     5     1 chr       <NA>             
#>  6     6     1 chr       <NA>             
#>  7     7     1 chr       Certificate      
#>  8     8     1 chr       <NA>             
#>  9     9     1 chr       <NA>             
#> 10    10     1 chr       <NA>             
#> # … with 122 more rows

rectify(y) # useful for reviewing the melted form as though in a spreadsheet
#> # A tibble: 22 × 7
#>    `row/col` `1(A)`            `2(B)`  `3(C)` `4(D)` `5(E)` `6(F)`
#>        <int> <chr>             <chr>   <chr>  <chr>  <chr>  <chr> 
#>  1         1 <NA>              <NA>    Female <NA>   Male   <NA>  
#>  2         2 <NA>              <NA>    0 - 6  7 - 10 0 - 6  7 - 10
#>  3         3 Bachelor's degree 15 - 24 7000   27000  <NA>   13000 
#>  4         4 <NA>              25 - 44 12000  137000 9000   81000 
#>  5         5 <NA>              45 - 64 10000  64000  7000   66000 
#>  6         6 <NA>              65+     <NA>   18000  7000   17000 
#>  7         7 Certificate       15 - 24 29000  161000 30000  190000
#>  8         8 <NA>              25 - 44 34000  179000 31000  219000
#>  9         9 <NA>              45 - 64 30000  210000 23000  199000
#> 10        10 <NA>              65+     12000  77000  8000   107000
#> # … with 12 more rows

y %>%
  behead("up-left", "sex") %>%               # Strip headers
  behead("up", "life-satisfication") %>%  # one
  behead("left-up", "qualification") %>%     # by
  behead("left", "age-band") %>%            # one.
  select(-row, -col, -data_type, count = chr) %>% # cleanup
  mutate(count = as.integer(count))
#> # A tibble: 80 × 5
#>     count sex    `life-satisfication` qualification     `age-band`
#>     <int> <chr>  <chr>                <chr>             <chr>     
#>  1   7000 Female 0 - 6                Bachelor's degree 15 - 24   
#>  2  12000 Female 0 - 6                Bachelor's degree 25 - 44   
#>  3  10000 Female 0 - 6                Bachelor's degree 45 - 64   
#>  4     NA Female 0 - 6                Bachelor's degree 65+       
#>  5  27000 Female 7 - 10               Bachelor's degree 15 - 24   
#>  6 137000 Female 7 - 10               Bachelor's degree 25 - 44   
#>  7  64000 Female 7 - 10               Bachelor's degree 45 - 64   
#>  8  18000 Female 7 - 10               Bachelor's degree 65+       
#>  9     NA Male   0 - 6                Bachelor's degree 15 - 24   
#> 10   9000 Male   0 - 6                Bachelor's degree 25 - 44   
#> # … with 70 more rows

Note the compass directions in the code above, which hint to behead() where to find the header cell for each data cell.

Installation

# install.packages("devtools") # If you don't already have devtools
devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)

The version 0.4.0 release had somee breaking changes. See NEWS.md for details. The previous version can be installed as follow:

devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")

Similar projects

unpivotr is inspired by Databaker, a collaboration between the United Kingdom Office of National Statistics and The Sensible Code Company. unpivotr.

jailbreaker attempts to extract non-tabular data from spreadsheets into tabular structures automatically via some clever algorithms. unpivotr differs by being less magic, and equipping you to express what you want to do.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.