The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The l1rotation
package offers functionality to simplify
the loading matrix in factor models. It can find the most sparse
rotation of the loading matrix using the l1-rotation criterion of Freyaldenhoven
(2025).
Under the assumption of sparsity in the true loading matrix, it also solves the problem of rotational indeterminacy inherent to factor models. That is, suppose the data follows a factor model:
\[ X = F \Lambda^{*'} + e \] where
\(X\) is a \(T \times n\) data matrix, where there are \(T\) rows and \(n\) variables, or columns
\(F\) is a \(T \times r\) matrix of \(r\) factors that the data is decomposed into
\(\Lambda^{*T}\) is an \(r \times n\) matrix of true loadings
\(e\) is a \(T \times n\) error matrix.
Then, the assumption of sparsity in the loading matrix solves the problem of rotational indeterminacy inherent to factor models: \(\Lambda^{*}\) will be the most sparse rotation and can be identified using the l1-rotation criterion Freyaldenhoven (2025).
We will use the example_data
data that ships with the
package to show its basic functionality. This data is a matrix
containing numeric information with \(n =
224\), \(T = 207\). In general,
data.frames, tibbles, or other data types can also be used with
l1rotation
functions, as long as all columns are numeric.
Note that the package cannot handle missing values in the data
matrix.
To start, let’s look at the first seven columns of the example data:
head(example_data[,1:7])
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 2.973310 3.563021 -0.04496568 -0.1700720 0.3350881 0.05244918 -1.1922173
#> [2,] 4.648603 4.817172 0.11200137 -0.5456076 0.1992616 0.33365310 -1.1699467
#> [3,] 4.162014 3.169177 -1.37955943 -2.5864043 -2.9352645 -0.79856910 -1.7227167
#> [4,] 2.097454 0.833101 -2.72023404 -2.9061952 -2.7111563 -0.18925022 -1.5973617
#> [5,] 1.783429 0.364707 -2.24466997 -1.9397028 -0.2521484 1.14563662 -0.6416022
#> [6,] 5.699166 3.932394 0.46050753 -0.5615002 2.3014500 2.22261949 0.5748873
We assume that the number of underlying factors can be learned from the data (e.g., following the procedure in Bai and Ng (2002) or Ahn and Horenstein (2013)). Note that this package does not include functionality to calculate the number of factors - we simply take the number of factors as a user input.
For the example_data
we will use two factors. With just
the data, \(X\), and the number of
factors, \(r\), we can start estimating
the loadings with local_factors()
. This function estimates
\(\Lambda^{*}\) and provides helpful
diagnostics and figures.
Below is an example using example_data
(note that
estimation can also be run in parallel with a selected number of cores,
n_cores
):
set.seed(916)
lf <- local_factors(
X = example_data, r = 2,
parallel = FALSE, n_cores = NULL # Runs non-parallel by default
)
In the estimation, the only required arguments for
local_factors()
are the data to be decomposed and the
number of factors. We use the principal components estimator as the
initial estimate of the loadings, \(\Lambda^0\), which can be accessed via the
initial_loadings
item of the output. The function also
computes a quick diagnostic to check whether local factors are present
in the data which is given in the output item
has_local_factors
.
Additionally, there are several rotation diagnostics accessible via
rotation_diagnostics
: (1) The rotation matrix, \(R\), that when multiplied by \(\Lambda^0\) produces \(\hat{\Lambda}^*\), (2) the value of the l1
norm for each vector, and (3) the frequency with which the minimization
problem converges at each of the estimated loading vectors.
lf$rotation_diagnostics
#> $R
#> R
#> V1 0.7011144 0.8388980
#> V2 -0.7130488 0.5442886
#>
#> $fval
#> [1] 145.705 149.850
#>
#> $sol_frequency
#> [1] 259 241
For a visual interpretation of this rotation, we provide tile plots
contrasting the initial estimate \(\Lambda^0\), pc_plot
and the
rotated estimate, \(\hat{\Lambda}^*\),
rotated_plot
.
In the initial principal component estimate, recall that each factor is simply a principal component. Of the 207 variables in \(X\), the first factor loads most negatively along the variables between 90 and 120, and slightly negatively almost everywhere else. Along the second factor, there are positive loadings between variables 122 and 207 and slightly negative loadings elsewhere. However, it may be difficult to interpret the relationship between variables and factors when all loadings are nonzero.
The second estimate is a rotated version of the loading matrix optimized for sparsity using the l1-rotation criterion. This estimate is easier to interpret as most variables between 90-207 load negatively on the first factor, variables 0-120 load negatively on the second factor, and all other loadings are close to zero.
l1rotation
supplies two additional functions,
find_local_factors()
and test_local_factors()
which provide additional functionality to support the main
local_factors()
function.
find_local_factors()
This function takes the same inputs as local_factors()
,
X
and r
, and has an additional argument,
initial_loadings
, that allows the user to specify any
orthonormal basis of the loadings rather than defaulting to the
principal component estimator. Alternative initial estimates may include
Maximum Likelihood based estimation or Sparse Orthogonal Factor
Regression (Uematsu
et al. (2019)), for example.
test_local_factors()
This function tests for the presence of local factors given a sparse
basis of the loading space. It takes as input X
and
r
, and an additional optional argument,
loadings
, that allows the user to specify the loading
matrix that is to be tested. This argument is set to NULL
by default, which estimates \(\hat{\Lambda}^*\) (by maximizing the
l1-rotation criterion) and tests it for local factors.
To construct this diagnostic for a given loading matrix estimate, \(\hat{\Lambda}\), we find the column with the largest number of entries smaller than some threshold \(h_n\):
\[ \mathcal{L(\hat{\Lambda})} = \max_k\left(\sum_{i=1}^n 1\{\hat{|\lambda}_{ik}| < h_n \}\right) \] We can then check whether the number of these “small” loadings is larger than \(\gamma n\)
\[ \texttt{has_local_factors} = 1\{\mathcal{L}(\hat{\Lambda}) \geq \gamma n \}. \]
Returning to our lf
results, we can take a look at the
value of has_local_factors
.
This value is the result of test_local_factors()
. To
verify, we can call test_local_factors()
on two different
estimates: the principal components estimate,
initial_loadings
, and the l1rotation estimate,
rotated_loadings
.
# Check for local factors in PC estimate...
test_pc_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$initial_loadings)
# And rotated estimate
test_rot_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$rotated_loadings)
test_pc_estimate$has_local_factors
#> [1] FALSE
test_rot_estimate$has_local_factors
#> [1] TRUE
Thus, no local factors are detected using the principal components
estimate, while the rotated estimate reveals two local factors. We can
also visualize the number of small loadings using the
small_loadings_plot
element in our `lf’ results:
round_hn <- round(test_rot_estimate$h_n, digits = 3)
lf$small_loadings_plot +
ggplot2::labs(
title = 'Number of "small" loadings per factor',
caption = paste('"Small" is defined as loadings less than', round_hn)
)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.