Help for reghdfe_programming

Using reghdfe with other commands

This help file describes how to use reghdfe to within other programs, either in Stata or Mata. It discusses three types of tools that might be useful for developers:

1. Ancillary commands from ftools that are used by reghdfe, such as ms_get_version.

2. Undocumented options of reghdfe.

3. The . viewsource reghdfe.mata Mata class behind reghdfe, which can be used to build efficient Mata estimation programs.

These commands are nested in order of integration with reghdfe. Someone writing a command independent of reghdfe might still benefit from #1. Someone writing a command that calls reghdfe a few times, such as ivreghdfe, sumhdfe, or did_imputation might benefit from #2. And someone writing a command that calls reghdfe multiple times, such as ppmlhdfe might also be interested in #3, due to the increase in efficiency and Mata integration.

1. Ancillary commands

It's possible your command will depend on other user-written commands, in the same way as reghdfe depends on ftools. To ensure compatibility and reproducibility, you can use the ms_get_version command to ensure that users are not running versions of these programs that are not too old. For instance, reghdfe version 6.12.0 requires ftools of at least version 2.46.0.

The syntax of ms_get_version is:

ms_get_version command [, min_version(str) min_date(str)]

Options Description
min_version(str) minimum version. Only supports semantic versioning of the form x.y.z. Note that versions are defined in the first line an ado-file, and can be verified by typing which <command>.
min_date(str) (less used) minimum date of the program. Supports dates of the form 1jan2018 or 01Jan2018.

min_version also stores the following local variables:

`version_number' version of the requested program
`version_date' date of the requested program
`package_version' concatenation of `version_number' and `version_string'

An sample usage of ms_get_version, currently used by reghdfe, is:

ms_get_version ftools, min_version("2.46.0")

2. Undocumented reghdfe options

Sometimes you might not want to run the entire reghdfe command, but stop at some point and only compute certain objects. There are several objects that allow this.

A) Compute HDFE Nata object but stop before partialling out variables

reghdfe ... , nopartialout [options]

This step will parse all inputs and initialize the HDFE object of the FixedEffects class. Note that although the regression variables (depvar and indepvar) are not processed, if they have missing values the sample will reflect that.

For instance, the sample ado-file below is enough to create a program that reports the number of singletons in a regression, without having to actually computed:

show_singletons.ado

prog show_singletons

qui reghdfe `0' nopartial

noi ereturn list

mata: st_local("n", strofreal(HDFE.num_singletons))

di as text "there are `n' singletons"

end

qui include "reghdfe.mata", adopath

B) Compute HDFE mata object, partial out the variables, but stop before regressing

reghdfe ... , noregress [options]

This step is as A), but will also partial out the variables wrt. the fixed effects and save the resulting information in the HDFE.solution object. For instance, HDFE.solution.data will contained the partialled-out data, and HDFE.solution.depvar will contain the name of the dependent variable.

This option can be used to (amongst other things) partial out all the variables only once, and then run regressions on the same sample and same regressors but with multiple left-hand-side variables (useful with very large datasets).

C) Run regression but keep the HDFE Mata object

reghdfe ... , keepmata [options]

By saving the HDFE object, this allows further manipulations of the fixed effects data, although the data corresponding to the partialled-out variables is not preserved.

3. FixedEffects Mata class

In order to use reghdfe's Mata functions without your own ado-file, you need to add the following at the end of your file:

include "reghdfe.mata", adopath

This dynamically loads all the reghdfe Mata functions and classes, so they are accessible to the ado-file. This alternative is preferred to sharing precompiled Mata objects, which would require compilation for multiple versions of Stata/Mata (or for the lowest possible version of Stata/Mata).

To construct the object, you can do:

class FixedEffects HDFE // Optional declaration

HDFE = FixedEffects() // Note that you can replace "HDFE" with whatever name you choose

HDFE.absvars = "firm_id year"

...

HDFE.init()

...

For more information, see the code of the Estimate function of reghdfe.ado

Properties and Methods

TODO: update this list

properties (factors) Description
Integer N number of obs
Integer M Sum of all possible FE coefs
Factors factors
Vector sample
Varlist absvars
Varlist ivars
Varlist cvars
Boolean has_intercept
RowVector intercepts
RowVector num_slopes
Integer num_singletons
Boolean save_any_fe
Boolean save_all_fe
Varlist targets
RowVector save_fe
properties (optimization options) Description
Real tolerance
Integer maxiter
String transform Kaczmarz Cimmino Symmetric_kaczmarz (k c s)
String acceleration Acceleration method. None/No/Empty is none\
Integer accel_start Iteration where we start to accelerate /set it at 6? 2?3?
string slope_method
Boolean prune Whether to recursively prune degree-1 edges
Boolean abort Raise error if convergence failed?
Integer accel_freq Specific to Aitken's acceleration
Boolean storing_alphas 1 if we should compute the alphas/fes
Real conlim specific to LSMR
Real btol specific to LSMR
properties (optimization objects) Description
BipartiteGraph bg Used when pruning 1-core vertices
Vector pruned_weight temp. weight for the factors that were pruned
Integer prune_g1 Factor 1/2 in the bipartite subgraph that gets pruned
Integer prune_g2 Factor 2/2 in the bipartite subgraph that gets pruned
Integer num_pruned Number of vertices (levels) that were pruned
properties (misc) Description
Integer verbose
Boolean timeit
Boolean store_sample
Real finite_condition
Real compute_rre Relative residual error: || e_k - e || / || e ||
Real rre_depvar_norm
Vector rre_varname
Vector rre_true_residual
properties (weight-specific) Description
Boolean has_weights
Variable weight unsorted weight
String weight_var Weighting variable
String weight_type Weight type (pw, fw, etc)
properties (absorbed degrees-of-freedom computations) Description
Integer G_extended Number of intercepts plus slopes
Integer df_a_redundant e(mobility)
Integer df_a_initial
Integer df_a df_a_inital - df_a_redundant
Vector doflist_M
Vector doflist_K
Vector doflist_M_is_exact
Vector doflist_M_is_nested
Vector is_slope
Integer df_a_nested Redundant due to bein nested; used for: r2_a r2_a_within rmse
properties (VCE and cluster variables) Description
String vcetype
Integer num_clusters
Varlist clustervars
Varlist base_clustervars
String vceextra
properties (regression-specific) Description
String varlist y x1 x2 x3 x4 z1 z2 z3
String depvar y
String indepvars x1 x2
Boolean drop_singletons
String absorb contents of absorb()
String select_if If condition
String select_in In condition
String model ols, iv
String summarize_stats
Boolean summarize_quietly
StringRowVector dofadjustments firstpair pairwise cluster continuous
Varname groupvar
String residuals
RowVector kept 1 if the regressors are not deemed as omitted (by partial_out+cholsolve+invsym)
String diopts
properties (output) Description
String cmdline
String subcmd
String title
Boolean converged
Integer iteration_count e(ic)
Varlist extended_absvars
String notes
Integer df_r
Integer df_m
Integer N_clust
Integer N_clust_list
Real rss
Real rmse
Real F
Real tss
Real tss_within
Real sumweights
Real r2
Real r2_within
Real r2_a
Real r2_a_within
Real ll
Real ll_0
(run this if e.g. touse changes)
methods Description
Void update_sorted_weights()
Matrix partial_out()
Void _partial_out() in-place alternative to partial_out()
Variables project_one_fe()
Void prune_1core()
Void _expand_1core()
Void estimate_dof()
Void estimate_cond()
Void save_touse()
Void store_alphas()
Void save_variable()
Void post_footnote()
Void post()
Void reload(copy=0)
methods (LSMR-specific) Description
Real lsmr_norm()
Vector lsmr_A_mult()
Vector lsmr_At_mult()

Additional functions

Several useful Mata functions are included. For instance,

void reghdfe_solve_ols(HDFE , X, ... )

Example: OLS regression

TODO: Update this example

{inp None}
sysuse auto, clear local depvar price local indepvars weight gear mata: HDFE = fixed_effects("turn", "", "fweight", "trunk", 0, 2) mata: HDFE.varlist = "`depvar' `indepvars'" mata: HDFE.indepvars = "`indepvars'" mata: data = HDFE.partial_out("`depvar' `indepvars'") mata: reghdfe_solve_ols(HDFE, data, b=., V=., N=., rank=., df_r=., resid=., kept=., "vce_none") mata: b
{text None}