HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.
Arguments
- x
Input matrix or data.frame, of dimension \((N\times p)\); each row is an observation vector.
- y
Response variable.
- weights
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions.
- kappa
A vector of target effective degrees of freedom of the regression.
- q
Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning.
- intercept
Should intercept be fitted. Default is
intercept=TRUE
.- standardize
Logical flag for
x
variable standardization prior to fitting the model. The coefficients are always returned on the original scale. Default isstandardize=TRUE
.- nfolds
The number of folds for k-fold cross validation. Default is
nfolds=10
.- foldid
An optional vector of values between
1
andnfolds
identifying what fold each observation is in. If supplied,nfolds
can be missing.- partial_method
Indicate whether to use pairwise partial correlations, or shrinkage partial correlations.
- l2_penalty
Optional penalty for level-specific regressions (useful in high-dimensional case)
- ...
Additional arguments passed to
hclust
.
Details
This function fits an HFR to a grid of kappa
hyperparameter values. The result is a
matrix of coefficients with one column for each hyperparameter. By evaluating all hyperparameters
in a single function, the speed of the cross-validation procedure is improved substantially (since
level-specific regressions are estimated only once).
When nfolds > 1
, a cross validation is performed with shuffled data. Alternatively,
test slices can be passed to the function using the foldid
argument. The result
of the cross validation is given by best_kappa
in the output object.
References
Pfitzinger, Johann (2024). Cluster Regularization via a Hierarchical Feature Regression. _Econometrics and Statistics_ (in press). URL https://doi.org/10.1016/j.ecosta.2024.01.003.
Examples
x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1))
coef(fit)
#> s1 s2 s3 s4 s5
#> (Intercept) -6.513300e-03 0.004424144 0.001414333 -0.0009662082 -0.001769557
#> V1 4.105750e-09 0.025055671 -0.002506424 -0.0155201046 -0.018773187
#> V2 -3.599427e-09 -0.035760516 -0.044661249 -0.0517428983 -0.057890643
#> V3 -4.194673e-09 -0.042116781 -0.052599577 -0.0523164708 -0.046914950
#> V4 -4.290485e-09 -0.057353963 -0.091465608 -0.0978371857 -0.097473308
#> V5 4.290625e-09 0.025636044 0.024558659 0.0327080488 0.033026981
#> V6 -3.776804e-09 -0.037621347 -0.046985238 -0.0510931197 -0.054571958
#> V7 -4.072737e-09 -0.040609966 -0.050717719 -0.0545243406 -0.058876777
#> V8 -4.390128e-09 -0.026613412 0.002662252 0.0100788419 0.006776848
#> V9 4.024101e-09 0.071225939 0.103287508 0.1109869630 0.111927547
#> V10 -3.850645e-09 -0.038356886 -0.047903851 -0.0520920460 -0.053405652
#> V11 -3.708088e-09 -0.037231204 -0.046497988 -0.0462477223 -0.044455759
#> V12 -3.651565e-09 -0.036410390 -0.045472875 -0.0488858445 -0.049654564
#> V13 -4.462772e-09 -0.044499075 -0.055574821 -0.0597459923 -0.058590136
#> V14 4.175371e-09 0.073376452 0.106406051 0.1169081467 0.121215023
#> V15 -3.970051e-09 -0.052800086 -0.084203281 -0.0933701270 -0.097025477
#> V16 -4.239701e-09 -0.042120703 -0.052604475 -0.0530436555 -0.050943424
#> V17 -4.055030e-09 -0.024581236 0.002458965 0.0192377218 0.027660931
#> V18 -4.265462e-09 -0.025857673 0.002586652 0.0097926336 0.006584406
#> V19 4.212937e-09 0.025171725 0.024113853 0.0321156411 0.031820120
#> V20 4.254742e-09 0.075308029 0.109207106 0.1165667870 0.115978572
#> s6 s7 s8 s9 s10
#> (Intercept) -0.002149329 -0.002658990 -0.003168651 -0.003678312 -0.004187974
#> V1 -0.020054996 -0.020951491 -0.021847986 -0.022744480 -0.023640975
#> V2 -0.063050170 -0.067759024 -0.072467878 -0.077176732 -0.081885586
#> V3 -0.040303293 -0.034079987 -0.027856680 -0.021633373 -0.015410067
#> V4 -0.097094363 -0.096859832 -0.096625301 -0.096390770 -0.096156239
#> V5 0.034348144 0.036090127 0.037832109 0.039574092 0.041316074
#> V6 -0.058970601 -0.062894621 -0.066818641 -0.070742662 -0.074666682
#> V7 -0.065831174 -0.072374832 -0.078918489 -0.085462147 -0.092005804
#> V8 0.004506273 0.005269777 0.006033281 0.006796785 0.007560289
#> V9 0.112929444 0.114084110 0.115238776 0.116393442 0.117548108
#> V10 -0.053327597 -0.053043440 -0.052759282 -0.052475124 -0.052190966
#> V11 -0.044654621 -0.044444357 -0.044234094 -0.044023830 -0.043813566
#> V12 -0.049517851 -0.049629665 -0.049741479 -0.049853293 -0.049965107
#> V13 -0.054036262 -0.049253604 -0.044470947 -0.039688289 -0.034905631
#> V14 0.125150603 0.129024915 0.132899227 0.136773540 0.140647852
#> V15 -0.099760197 -0.102091384 -0.104422571 -0.106753759 -0.109084946
#> V16 -0.050006469 -0.049327889 -0.048649310 -0.047970731 -0.047292151
#> V17 0.033505917 0.038710534 0.043915152 0.049119770 0.054324387
#> V18 0.003607763 -0.002195856 -0.007999474 -0.013803093 -0.019606712
#> V19 0.031679941 0.030575928 0.029471915 0.028367902 0.027263889
#> V20 0.115008480 0.114442837 0.113877195 0.113311552 0.112745909
#> s11
#> (Intercept) -0.004697653
#> V1 -0.024537419
#> V2 -0.086594381
#> V3 -0.009186821
#> V4 -0.095921738
#> V5 0.043058117
#> V6 -0.078590641
#> V7 -0.098549408
#> V8 0.008324208
#> V9 0.118702805
#> V10 -0.051906786
#> V11 -0.043603251
#> V12 -0.050076960
#> V13 -0.030122951
#> V14 0.144522166
#> V15 -0.111416087
#> V16 -0.046613613
#> V17 0.059528917
#> V18 -0.025410714
#> V19 0.026159747
#> V20 0.112180333