The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
retype
quick start your analysisGetting data into R can be hassle. But once you do, it often have incorrect data types/classes. For instance, it is not uncommon that numeric variables are characters or dates are classed as characters.
Data conversion is cumbersome and small coding mistakes can produce large issues. The hablar package facilitates correction of all data types directly after you import the data into R such that you can avoid dangerous operations at later stages!
retype
do?retype
provides an easy approach for quick and dirty
data type conversion. It follows a strict simplification hierarchy for
each column of your data frame. It only converts the column if it can
assume that no important information is lost in the process. For
example, the character vector c("1", "2")
should rather be
an integer vector. Similarly, the character "2010-06-04"
should be a date. Factors have advantages, but they are never the
simplest solution and hence it is always converted to character, at
least.
retype(x, ...)
where x
is a data frame, and ...
is the
column names you want to apply retype
to. x
could also be a single vector.
<- as.numeric(3)
x retype(x)
#> [1] 3
class(retype(x))
#> [1] "integer"
<- as.character("2017-03-02")
x retype(x)
#> [1] "2017-03-02"
class(retype(x))
#> [1] "Date"
<- as.character(c("3,56", "0,78"))
x retype(x)
#> [1] "3,56" "0,78"
class(retype(x))
#> [1] "character"
<- as.factor(c(3, 4))
x retype(x)
#> [1] 3 4
class(retype(x))
#> [1] "integer"
retype
uses a procedure to determine which data type is
the simplest, without loosing any vital information in your data.
The first thing to know about retype
is that it
always converts factors to character.
The second thing to know is that all logical columns are converted to integers.
Thirdly, complex and list columns are left unchanged.
From there it will test if the data could be coded as numeric. If true it converts the column to numeric.
If it is numeric it tests if it could be an integer instead. If true, it converts the column to integer.
If it is a character it tests if it could be a date column. If true, it converts it to a date column.
If it is a date time column it tests if it could be a date. If true, it converts it to a date column.
The above procedure could more intuitively be described in a diagram. The arrows imply a test if a column could be converted to another without loosing information in your data. The procedure continues until it cannot be simplified further.
Examine the following dataset starwars
from the package
dplyr
. First, we use convert
on some columns
to new data types.
<- starwars %>%
df select(1:4) %>%
convert(fct(name),
chr(height:mass),
fct(hair_color)) %>%
print()
#> # A tibble: 87 × 4
#> name height mass hair_color
#> <fct> <chr> <chr> <fct>
#> 1 Luke Skywalker 172 77 blond
#> 2 C-3PO 167 75 <NA>
#> 3 R2-D2 96 32 <NA>
#> 4 Darth Vader 202 136 none
#> # … with 83 more rows
#> # ℹ Use `print(n = ...)` to see more rows
We then apply retype
on df
:
%>%
df retype()
#> # A tibble: 87 × 4
#> name height mass hair_color
#> <chr> <int> <dbl> <chr>
#> 1 Luke Skywalker 172 77 blond
#> 2 C-3PO 167 75 <NA>
#> 3 R2-D2 96 32 <NA>
#> 4 Darth Vader 202 136 none
#> # … with 83 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Which correctly guessed that height preferably should be an integer vector and that mass works better as a numeric column. The factors were converted to character columns.
retype
in production codeNever use retype
when you need your scripts to work the
next time in the exact same way. retype
may change over
time, it could guess wrong and your data may change. Use
hablar::convert
instead where you explicitly state which
data type each column should have.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.