Help for hdfe

Title

hdfe Partial-out variables with respect to multiple levels of fixed-effects

Syntax

Replace current dataset:

hdfe varlist [weight] , absorb(absvars) clear [keepvars(varlist) keepids] [clustervars(varlist) options]

Keep current dataset and add new variables:

hdfe varlist [weight] , absorb(absvars) generate(stubname) [sample(newvarname)] [clustervars(varlist) options]

Options Description
HDFE-Specific
clear will overwrite the dataset; leaving the transformed variables, as well as some ancillary ones (such as the fixed effects, weights, cluster variables, etc.).
If you use hdfe with factor variables, you may have trouble relating the old names (e.g. i.turn) to new names.
The solution lies in this line: mata: asarray(varlist_cache, "i.turn")
keepvars(varlist) keep additional variables
keepids keep the temporary variables for the fixed effects (useful if you set them up like id#year)
generate(stubname) will not overwrite the variables; instead creating new demeaned variables with the stubname prefix
sample(newvarname) will save the equivalent of e(sample) in this variable; useful when dropping singletons. Used with the generate option.
clustervars(varlist) list of variables containing cluster categories. This is used to give more accurate number of degrees of freedom lost due to the fixed effects, as reported on r(df_a).
Diagnostic [+]
verbose(#) amount of debugging information to show (0=None, 1=Some, 2=More, 3=Parsing/convergence details, 4=Every iteration)
timeit show elapsed times by stage of computation
Optimization [+]
+ tolerance(#) criterion for convergence (default=1e-8)
maxiterations(#) maximum number of iterations (default=10,000); if set to missing (.) it will run for as long as it takes.
poolsize(#) apply the within algorithm in groups of # variables (default 10). a large poolsize is usually faster but uses more memory
acceleration(str) acceleration method; options are conjugate_gradient (cg), steep_descent (sd), aitken (a), and none (no)
transform(str) transform operation that defines the type of alternating projection; options are Kaczmarz (kac), Cimmino (cim), Symmetric Kaczmarz (sym)
Degrees-of-Freedom Adjustments [+]
dofadjustments(list) allows selecting the desired adjustments for degrees of freedom; rarely used
groupvar(newvar) unique identifier for the first mobility group
Reporting [+]
version reports the version number and date of hdfe, and saves it in e(version). standalone option
Undocumented
keepsingletons do not drop singleton groups
* absorb(absvars) is required.
+ indicates a recommended or important option.
all variables may contain time-series operators and factor variables; see tsvarlist and fvvarlist.
fweights, aweights and pweights are allowed; see weight.

Absvar Syntax

absvar Description
i.varname categorical variable to be absorbed (the i. prefix is tacit)
i.var1#i.var2 absorb the interactions of multiple categorical variables
i.var1#c.var2 absorb heterogeneous slopes, where var2 has a different slope coef. depending on the category of var1
var1##c.var2 equivalent to "i.var1 i.var1#c.var2", but much faster
var1##c.(var2 var3) multiple heterogeneous slopes are allowed together. Alternative syntax: var1##(c.var2 c.var3)
v1#v2#v3##c.(v4 v5) factor operators can be combined
Using categorical interactions (e.g. x#z) is faster than running egen group(...) beforehand.
Singleton obs. are dropped iteratively until no more singletons are found (see ancilliary article for details).
Slope-only absvars ("state#c.time") have poor numerical stability and slow convergence. If you need those, either i) increase tolerance or ii) use slope-and-intercept absvars ("state##c.time"), even if the intercept is redundant. For instance if absvar is "i.zipcode i.state##c.time" then i.state is redundant given i.zipcode, but convergence will still be much faster.

Description

hdfe computes the residuals of a set of variables with respect to multiple levels of fixed effects. It is a generalization of the within transformation done by areg and xtreg,fe for more than one fixed effect, also allowing for multiple heterogeneous intercepts.

hdfe is a programmers' routine that serves as a building block to other regression packages so they can support multiple fixed effects (see for instance {search binscatter}, regife and {search poi2hdfe}). It contains the same code underlying reghdfe and exposes most of its functionality and options.

It also computes the degrees-of-freedom absorbed by the fixed effects and stores them in e(df_a).

It works well with other building-block packages such as avar (from SSC).

Example Usage

Suppose you want to replicate reghdfe. Then, you would do:

sysuse auto, clear
* Benchmark
reghdfe price weight length, a(turn trunk)
* Demean variables
hdfe price weight length, a(turn trunk) gen(RESID_)
local df_a = e(df_a)
* Run regression
quietly regress RESID_*, nocons
* Fix degrees-of-freedom
local df_r = e(df_r) - `df_a'
matrix b = e(b)
matrix V = e(V) * e(df_r) / `df_r'
ereturn post b V, dep(price) obs(`c(N)') dof(`df_r')
ereturn display

Stored results

hdfe stores the following in e():

Scalars
e(df_a) degrees of freedom lost due to the fixed effects (taking into account the cluster structure and whether the FEs are nested within the clusters)
e(N_hdfe) number of sets of fixed effects
Macros
e(absvars) canonical expansion of the fixed effects
e(extended_absvars) expansion of the fixed effects separating heterogeneous slopes (e.g. y##c.z is expanded to y y#c.z)

Author

Sergio Correia
Fuqua School of Business, Duke University
Email: sergio.correia@duke.edu

User Guide

A copy of this help file, as well as a more in-depth user guide is in development and will be available at "http://scorreia.com/reghdfe".

Latest Updates

hdfe is updated frequently, and upgrades or minor bug fixes may not be immediately available in SSC. To check or contribute to the latest version of hdfe, explore the Github repository. Bugs or missing features can be discussed through email or at the Github issue tracker.

To see your current version and installed dependencies, type reghdfe, version

Acknowledgements

This package wouldn't have existed without the invaluable feedback and contributions of Paulo Guimaraes, Amine Ouazad, Mark Schaffer and Kit Baum. Also invaluable are the great bug-spotting abilities of many users.