The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Getting Started with unexcel

Hercules Freitas

2025-10-04

Introduction

Spreadsheets such as Microsoft Excel and LibreOffice Calc often auto-convert entries like 30.3 into dates. When imported into R, these values appear as Excel date serials (integers) instead of the intended numeric values.

For example: - In Excel, 30.3 might be stored as 45812, which represents 2025-03-30 in the 1900 date system. - What the user originally wanted was simply 30.3.

The unexcel package provides tools to safely detect these cases and reconstruct the originally intended day.month numerics. Non-serial values are preserved, and both the 1900 and 1904 date systems are supported.

Excel date systems

Excel represents dates as the number of days since an origin date. - 1900 system: Day 1 is 1900-01-01. Excel mistakenly treats 1900 as a leap year. To compensate, R and most packages use “1899-12-30” as the origin, so calculations align with Excel’s behavior. - 1904 system: Day 0 is 1904-01-01, historically used by older Mac versions of Excel.

Example:

as.Date(45812, origin = "1899-12-30")
## [1] "2025-06-04"
#> "2025-03-30"

Here, 45812 is the serial for March 30, 2025.

The 1900 leap-year bug

Excel was originally designed to be compatible with Lotus 1-2-3, which treated 1900 as a leap year (incorrectly). This means Excel includes the non-existent date 1900-02-29 in its calendar.

To correct for this, R uses an origin of “1899-12-30” rather than “1900-01-01”. This offset ensures that serial numbers map correctly to the dates displayed inside Excel.

Reconstructing day.month numerics

The function restore_day_month() converts Excel serials back into numerics of the form day.month.

library(unexcel)

# Mixed vector: two serials + one real number
x <- c(45812, 12.5, 44730)

restore_day_month(x, origin_mode = "1900")
## [1]  4.6 12.5 18.6
#> [1] 30.3 12.5 15.6

Explanation: - 45812 (2025-03-30) → 30.3 - 44730 (2022-06-15) → 15.6 - 12.5 (not a serial) → preserved as 12.5

Guardrails against false conversions

To avoid changing ordinary numbers, unexcel applies guardrails: - Only integer-like values are considered potential serials. - Values must fall in a plausible range (20000–65000, roughly 1954–2078). - The converted year must lie inside a configurable year_window (default: 1990–2035).

Anything outside these conditions is left unchanged.

Difference from formatted strings

By design, unexcel reconstructs numeric day.month values, not character strings: - Numeric output: 30.3 (easily compared or plotted). - Character output: “30/03” (typical formatted date).

If you prefer a string, you can use R’s base functions:

format(as.Date(45812, origin = "1899-12-30"), "%d/%m")
## [1] "04/06"
#> "30/03"

But the goal of unexcel is to give you back the numeric decimal you originally typed in Excel.

Working with data frames

The helper fix_serial_columns() scans an entire data frame and corrects only the columns that appear dominated by Excel serials.

df <- data.frame(
  a = c(45812, 44730, 45900),
  b = c(1.2, 3.4, 5.6)
)

fix_serial_columns(df, origin_mode = "1900")
##      a   b
## 1  4.6 1.2
## 2 18.6 3.4
## 3 31.8 5.6
#>      a   b
#> 1 30.3 1.2
#> 2 15.6 3.4
#> 3 19.4 5.6

Column a was fixed; column b was left untouched.

Summary

Session information

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.0.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] C/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
## 
## time zone: America/Sao_Paulo
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] unexcel_0.1.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
## [13] compiler_4.5.1    tools_4.5.1       evaluate_1.0.4    bslib_0.9.0      
## [17] yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.