Help for reghdfe_programming

Using reghdfe with other commands

This help file describes how to use reghdfe to within other programs, either in Stata or Mata. It discusses three types of tools that might be useful for developers:

1. Ancillary commands from ftools that are used by reghdfe, such as ms_get_version.

2. Undocumented options of reghdfe.

3. The . viewsource reghdfe.mata Mata class behind reghdfe, which can be used to build efficient Mata estimation programs.

These commands are nested in order of integration with reghdfe. Someone writing a command independent of reghdfe might still benefit from #1. Someone writing a command that calls reghdfe a few times, such as ivreghdfe, sumhdfe, or did_imputation might benefit from #2. And someone writing a command that calls reghdfe multiple times, such as ppmlhdfe might also be interested in #3, due to the increase in efficiency and Mata integration.

1. Ancillary commands

ms_get_version

It's possible your command will depend on other user-written commands, in the same way as reghdfe depends on ftools. To ensure compatibility and reproducibility, you can use the ms_get_version command to ensure that users are not running versions of these programs that are not too old. For instance, reghdfe version 6.12.0 requires ftools of at least version 2.46.0.

The syntax of ms_get_version is:

ms_get_version command [, min_version(str) min_date(str)]

Options		Description
	`min_version(str)`	minimum version. Only supports semantic versioning of the form `x.y.z`. Note that versions are defined in the first line an ado-file, and can be verified by typing `which <command>`.
	`min_date(str)`	(less used) minimum date of the program. Supports dates of the form 1jan2018 or 01Jan2018.

min_version also stores the following local variables:

	`version_number'	version of the requested program
	`version_date'	date of the requested program
	`package_version'	concatenation of `version_number' and `version_string'

An sample usage of ms_get_version, currently used by reghdfe, is:

ms_get_version ftools, min_version("2.46.0")

2. Undocumented reghdfe options

Sometimes you might not want to run the entire reghdfe command, but stop at some point and only compute certain objects. There are several objects that allow this.

A) Compute HDFE Nata object but stop before partialling out variables

reghdfe ... , nopartialout [options]

This step will parse all inputs and initialize the HDFE object of the FixedEffects class. Note that although the regression variables (depvar and indepvar) are not processed, if they have missing values the sample will reflect that.

For instance, the sample ado-file below is enough to create a program that reports the number of singletons in a regression, without having to actually computed:

show_singletons.ado

prog show_singletons

qui reghdfe `0' nopartial

noi ereturn list

mata: st_local("n", strofreal(HDFE.num_singletons))

di as text "there are `n' singletons"

end

qui include "reghdfe.mata", adopath

B) Compute HDFE mata object, partial out the variables, but stop before regressing

reghdfe ... , noregress [options]

This step is as A), but will also partial out the variables wrt. the fixed effects and save the resulting information in the HDFE.solution object. For instance, HDFE.solution.data will contained the partialled-out data, and HDFE.solution.depvar will contain the name of the dependent variable.

This option can be used to (amongst other things) partial out all the variables only once, and then run regressions on the same sample and same regressors but with multiple left-hand-side variables (useful with very large datasets).

C) Run regression but keep the HDFE Mata object

reghdfe ... , keepmata [options]

By saving the HDFE object, this allows further manipulations of the fixed effects data, although the data corresponding to the partialled-out variables is not preserved.

3. FixedEffects Mata class

In order to use reghdfe's Mata functions without your own ado-file, you need to add the following at the end of your file:

include "reghdfe.mata", adopath

This dynamically loads all the reghdfe Mata functions and classes, so they are accessible to the ado-file. This alternative is preferred to sharing precompiled Mata objects, which would require compilation for multiple versions of Stata/Mata (or for the lowest possible version of Stata/Mata).

To construct the object, you can do:

class FixedEffects HDFE // Optional declaration

HDFE = FixedEffects() // Note that you can replace "HDFE" with whatever name you choose

HDFE.absvars = "firm_id year"

...

HDFE.init()

...

For more information, see the code of the Estimate function of reghdfe.ado

Properties and Methods

TODO: update this list

properties (factors)		Description
	`Integer` `N`	number of obs
	`Integer` `M`	Sum of all possible FE coefs
	`Factors` `factors`
	`Vector` `sample`
	`Varlist` `absvars`
	`Varlist` `ivars`
	`Varlist` `cvars`
	`Boolean` `has_intercept`
	`RowVector` `intercepts`
	`RowVector` `num_slopes`
	`Integer` `num_singletons`
	`Boolean` `save_any_fe`
	`Boolean` `save_all_fe`
	`Varlist` `targets`
	`RowVector` `save_fe`

properties (optimization options)		Description
	`Real` `tolerance`
	`Integer` `maxiter`
	`String` `transform`	Kaczmarz Cimmino Symmetric_kaczmarz (k c s)
	`String` `acceleration`	Acceleration method. None/No/Empty is none\
	`Integer` `accel_start`	Iteration where we start to accelerate /set it at 6? 2?3?
	`string` `slope_method`
	`Boolean` `prune`	Whether to recursively prune degree-1 edges
	`Boolean` `abort`	Raise error if convergence failed?
	`Integer` `accel_freq`	Specific to Aitken's acceleration
	`Boolean` `storing_alphas`	1 if we should compute the alphas/fes
	`Real` `conlim`	specific to LSMR
	`Real` `btol`	specific to LSMR

properties (optimization objects)		Description

	`BipartiteGraph` `bg`	Used when pruning 1-core vertices
	`Vector` `pruned_weight`	temp. weight for the factors that were pruned
	`Integer` `prune_g1`	Factor 1/2 in the bipartite subgraph that gets pruned
	`Integer` `prune_g2`	Factor 2/2 in the bipartite subgraph that gets pruned
	`Integer` `num_pruned`	Number of vertices (levels) that were pruned

properties (misc)		Description
	`Integer` `verbose`
	`Boolean` `timeit`
	`Boolean` `store_sample`
	`Real` `finite_condition`
	`Real` `compute_rre`	Relative residual error: \|\| e_k - e \|\| / \|\| e \|\|
	`Real` `rre_depvar_norm`
	`Vector` `rre_varname`
	`Vector` `rre_true_residual`

properties (weight-specific)		Description
	`Boolean` `has_weights`
	`Variable` `weight`	unsorted weight
	`String` `weight_var`	Weighting variable
	`String` `weight_type`	Weight type (pw, fw, etc)

properties (absorbed degrees-of-freedom computations)		Description
	`Integer` `G_extended`	Number of intercepts plus slopes
	`Integer` `df_a_redundant`	e(mobility)
	`Integer` `df_a_initial`
	`Integer` `df_a`	df_a_inital - df_a_redundant
	`Vector` `doflist_M`
	`Vector` `doflist_K`
	`Vector` `doflist_M_is_exact`
	`Vector` `doflist_M_is_nested`
	`Vector` `is_slope`
	`Integer` `df_a_nested`	Redundant due to bein nested; used for: r2_a r2_a_within rmse

properties (VCE and cluster variables)		Description
	`String` `vcetype`
	`Integer` `num_clusters`
	`Varlist` `clustervars`
	`Varlist` `base_clustervars`
	`String` `vceextra`

properties (regression-specific)		Description
	`String` `varlist`	y x1 x2 x3 x4 z1 z2 z3
	`String` `depvar`	y
	`String` `indepvars`	x1 x2
	`Boolean` `drop_singletons`
	`String` `absorb`	contents of absorb()
	`String` `select_if`	If condition
	`String` `select_in`	In condition
	`String` `model`	ols, iv
	`String` `summarize_stats`
	`Boolean` `summarize_quietly`
	`StringRowVector` `dofadjustments`	firstpair pairwise cluster continuous
	`Varname` `groupvar`
	`String` `residuals`
	`RowVector` `kept`	1 if the regressors are not deemed as omitted (by partial_out+cholsolve+invsym)
	`String` `diopts`

properties (output)		Description
	`String` `cmdline`
	`String` `subcmd`
	`String` `title`
	`Boolean` `converged`
	`Integer` `iteration_count`	e(ic)
	`Varlist` `extended_absvars`
	`String` `notes`
	`Integer` `df_r`
	`Integer` `df_m`
	`Integer` `N_clust`
	`Integer` `N_clust_list`
	`Real` `rss`
	`Real` `rmse`
	`Real` `F`
	`Real` `tss`
	`Real` `tss_within`
	`Real` `sumweights`
	`Real` `r2`
	`Real` `r2_within`
	`Real` `r2_a`
	`Real` `r2_a_within`
	`Real` `ll`
	`Real` `ll_0`

(run this if e.g. touse changes)

methods		Description
	`Void` `update_sorted_weights`()
	`Matrix` `partial_out`()
	`Void` `_partial_out`()	in-place alternative to `partial_out()`
	`Variables` `project_one_fe`()
	`Void` `prune_1core`()
	`Void` `_expand_1core`()
	`Void` `estimate_dof`()
	`Void` `estimate_cond`()
	`Void` `save_touse`()
	`Void` `store_alphas`()
	`Void` `save_variable`()
	`Void` `post_footnote`()
	`Void` `post`()
	`Void` `reload`(copy=0)

methods (LSMR-specific)		Description
	`Real` `lsmr_norm`()
	`Vector` `lsmr_A_mult`()
	`Vector` `lsmr_At_mult`()

Additional functions

Several useful Mata functions are included. For instance,

void reghdfe_solve_ols(HDFE , X, ... )

Example: OLS regression

TODO: Update this example

{inp None}

sysuse auto, clear local depvar price local indepvars weight gear mata: HDFE = fixed_effects("turn", "", "fweight", "trunk", 0, 2) mata: HDFE.varlist = "`depvar' `indepvars'" mata: HDFE.indepvars = "`indepvars'" mata: data = HDFE.partial_out("`depvar' `indepvars'") mata: reghdfe_solve_ols(HDFE, data, b=., V=., N=., rank=., df_r=., resid=., kept=., "vce_none") mata: b

{text None}