The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

fauxnaif

License: MIT R build status Dependencies

faux-naïf (/ˌfoʊ.naɪˈif/): a person who pretends to be simple or innocent

fauxnaif: an R package for simplifying data by pretending values are NA

Overview

fauxnaif provides an extension to dplyr::na_if(). Unlike dplyr’s na_if(), na_if_in() allows you to specify multiple values to be replaced with NA using a single function. fauxnaif also includes a complementary function na_if_not() to specify values to keep.

Installation

You can install fauxnaif from CRAN:

install.packages("fauxanif")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("rossellhayes/fauxnaif")

Usage

library(dplyr)
library(fauxnaif)

The basics

Let’s say we want to remove an unwanted negative value from a vector of numbers

-1:10
#>  [1] -1  0  1  2  3  4  5  6  7  8  9 10

We can replace -1…

… explicitly:

na_if_in(-1:10, -1)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

… by specifying values to keep:

na_if_not(-1:10, 0:10)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

… using a formula:

na_if_in(-1:10, ~ . < 0)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

A little more complex

messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")

We can replace unwanted values…

… one at a time:

na_if_in(messy_string, "")
#> [1] "abc"  NA     "def"  "NA"   "ghi"  "42"   "jkl"  "NULL" "mno"

… or all at once:

na_if_in(messy_string, "", "NA", "NULL", 1:100)
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, list("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"

… or using a clever formula:

grepl("[a-z]{3,}", messy_string)
#> [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"

With data frames

faux_census
#> # A tibble: 5 × 4
#>   state    age  income gender                      
#>   <chr>  <dbl>   <dbl> <chr>                       
#> 1 TX        57 9999999 Gender is a social construct
#> 2 Canada    49  149000 Male                        
#> 3 NY       557   90750 f                           
#> 4 LA         2   61000 Male                        
#> 5 TN        64 9999999 M

na_if_in() is particularly useful inside dplyr::mutate():

faux_census %>%
 mutate(
   income = na_if_in(income, 9999999),
   age    = na_if_in(age, ~ . < 18, ~ . > 120),
   state  = na_if_not(state, ~ grepl("^[A-Z]{2,}$", .)),
   gender = na_if_in(gender, ~ nchar(.) > 20)
 )
#> # A tibble: 5 × 4
#>   state   age income gender
#>   <chr> <dbl>  <dbl> <chr> 
#> 1 TX       57     NA <NA>  
#> 2 <NA>     49 149000 Male  
#> 3 NY       NA  90750 f     
#> 4 LA       NA  61000 Male  
#> 5 TN       64     NA M

Or you can use dplyr::across() on data frames:

faux_census %>%
  mutate(
    across(age, na_if_in, ~ . < 18, ~ . > 120),
    across(state, na_if_not, ~ grepl("^[A-Z]{2,}$", .)),
    across(where(is.character), na_if_in, ~ nchar(.) > 20),
    across(everything(), na_if_in, 9999999)
  )
#> # A tibble: 5 × 4
#>   state   age income gender
#>   <chr> <dbl>  <dbl> <chr> 
#> 1 TX       57     NA <NA>  
#> 2 <NA>     49 149000 Male  
#> 3 NY       NA  90750 f     
#> 4 LA       NA  61000 Male  
#> 5 TN       64     NA M

Hex sticker fonts are Bodoni* by indestructible type* and Source Code Pro by Adobe.

Image adapted from icon made by Freepik from flaticon.com.

Please note that fauxnaif is released with a Contributor Code of Conduct.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.