ppmlhdfe —

Poisson pseudolikelihood regression with multiple levels of fixed effects 
ppmlhdfe
depvar [indepvars] [if] [in] [weight] ,
[absorb(absvars)
] [options]
Options  Description  
Model  
absorb(absvars) 
categorical variables to be absorbed (fixed effects); individual slopes are also allowed  
absorb( ..., savefe)

save all fixed effect estimates with __hdfe as prefix  
exposure(varname) 
include ln(varname) in model with coefficient constrained to 1  
offset(varname) 
include varname in model with coefficient constrained to 1  
d(newvar)

save sum of fixed effects as newvar; mandatory if running predict afterwards (except for predict,xb)  
d

as above, but variable will be saved as _ppmlhdfe_d  
separation(string) 
algorithm used to drop separated observations and their associated regressors. Valid options are fe, ir, simplex, and mu (or any combination of those). Although ir (iterated rectifier) is the only one that can systematically correct separation arising from both regressors and fixed effects, by default the first three methods are applied ( fe simplex ir). See the ppmlhdfe paper as well as this guide for more information.  
SE/Robust  
vce ( vcetype)

vcetype may be robust (default) or cluster fvvarlist (allowing two and multiway clustering) 

Reporting  
eform 
report exponentiated coefficients (incidencerate ratios)  
irr 
synonym for eform


display_options  control many options of the regression table, such as confidence levels, number formats, etc.  
Optimization  
tolerance(#) 
criterion for convergence (default: 1e8)  
guess(string) 
set rule for setting initial values; valid options are simple (default, almost always faster) and ols  
Diagnostic and undocumented  
verbose(#) 
amount of debugging information to show; use v(1) or higher to view additional information; secret option: v(1) disables all messages  
[no ]log

hide iteration log  
keepsingletons 
do not drop singleton groups  
version 
reports the version number and date of ppmlhdfe, and the list of required packages. standalone option  
timeseries operators and factor variables are allowed; the dependent variable cannot be of the form i.turn, but 42.turn works  
fweight s and pweight s are allowed; see weight. 
ppmlhdfe
implements Poisson pseudomaximum likelihood regressions (PPML) with multiway fixed effects, as described by Correia, Guimarães, Zylkin (2019a). The estimator employed is robust to statistical separation and convergence issues, due to the procedures developed in Correia, Guimarães, Zylkin (2019b).
This package has four key advantages:
1. Allows any number and combination of fixed effects and individual slopes.
2. Correctly detects and drops separated observations (Correia, Guimarães, Zylkin 2019b). This issue would be otherwise particularly pernicious in regressions with many fixed effects, and can lead to lack of convergence, or even worse, incorrect estimates.
3. Allows two and multiway clustering, and can be used in combination with boottest to derive wild bootstrap inference.
4. Includes several algorithmic shortcuts and accelerations aimed at allowing its use with very large datasets.
PPML models are particularly useful in models with positive count (and noncount) outcome variables, where otherwise applying leastsquares regressions on outcome variables of the form log(y) would lead to inconsistent estimates in the presence of heteroskedasticity.
These models are thus important in trade economics (where common outcomes include log(exports)), labor economics (log wage), finance (log credit, log sales, etc.), innovation (log patents), etc. Further, they alleviate the issue of dealings with zerooutcomes variables (as log(0) is minus infinity), and allow applied economists to jointly estimate effects at the intensive and extensive margins.
absvar  Description  
varname  categorical variable to be absorbed (fixed effect)  
i. varname

same as above; the i. prefix is always tacit 

i. var1#i. var2

absorb pairwise combinations of two or more categorical variables (e.g. countrytime fixed effects)  
i. var1## c. var2

absorb fixed effects and individual slopes (e.g. "i.country##c.time" includes country FEs and different time trend per country)  
i. var1# c. var2

only absorbs individual slopes (advice: never run "i.id i.id#c.z", as it is slower and less accurate that running "i.id##c.z")  
var1##c.( var2 var3)

multiple heterogeneous slopes are allowed together. Alternative syntax: var1##(c. var2 c. var3)


v1# v2# v3##c.( v4 v5)

factor operators can be combined  
 To save the estimates specific absvars, write newvar=absvar.  
 However, be aware that estimates for the fixed effects are generally inconsistent and not econometrically identified.  
 Using categorical interactions (e.g. x# z) is faster than running egen group(...) beforehand.  
 Singleton observations are dropped iteratively until no more singletons are found (see linked article for details). 
You can use all of the reghdfe optimization options. Particularly useful are itol(#)
to set the tolerance used when partialling out fixed effects, as well as the accel()
, transform()
, and prune
options to modify the partialling out method.
You can also modify the parameters used internally for the IRLS iteration and for each separation method. For instance, standardize_data(0)
will disable the standardization of variables (done to increase numerical accuracy), while use_exact_solver(1)
will run avoid using a faster version of the least squares solver on the initial IRLS iterations.
More information is available here.
Convergence is decided based on the deviance (and thus loglikelihood), not coefficients or residuals. Thus, we declare convergence once relative changes of the deviance fall below tolerance(#)
.
Note that although continuing to iterate further should not improve the overall fit of the model, it could improve the quality of e.g. fixed effect estimates. For an example of this, see this dofile.
The predict, test, and margins postestimation commands are available after ppmlhdfe
.
Also the three standard estat subcommands are allowed: estat ic
, estat summarize
, and estat vce
.
Sergio Correia
Board of Governors of the Federal Reserve
Email: sergio.correia@gmail.com
Paulo Guimarães
Banco de Portugal, Portugal
Email: pguimaraes2001@gmail.com
Thomas Zylkin
Economics Department Robins School of Business, University of Richmond
Email: tzylkin@richmond.edu
Sergio Correia, Paulo Guimarães, Thomas Zylkin: "ppmlhdfe: Fast Poisson Estimation with HighDimensional Fixed Effects", 2019; arXiv:1903.01690.
Sergio Correia, Paulo Guimarães, Thomas Zylkin: "Verifying the existence of maximum likelihood estimates for generalized linear models", 2019; arXiv:1903.01633.
>> BibTeX text available here <<
ppmlhdfe
requires the reghdfe
and ftools
packages.
To see your current version, and to see the installed dependencies, type ppmlhdfe, version
To download the latest version, to report report any issues, or for additional support, please see the Github repo of the project.
ppmlhdfe
stores the following in e()
:
Scalars  
e(N) 
number of observations  
e(num_singletons) 
number of dropped singleton observations  
e(num_separated) 
number of dropped separated observations  
e(N_full) 
number of observations, including dropped singleton and separated observations  
e(drop_singletons) 
whether singleton observations were searched for and dropped or not  
e(rank) 
rank of e(V)


e(df) 
residual degrees of freedom  
e(df_m) 
model degrees of freedom  
e(df_a) 
degrees of freedom lost due to the fixed effects  
e(df_a_initial) 
number of categories in the fixed effects; same as e(df_a) but ignoring redundant categories  
e(df_a_redundant) 
number of redundant fixed effect categories  
e(N_hdfe) 
number of absorbed fixedeffects  
e(N_hdfe_extended) 
number of absorbed fixedeffects plus fixedslopes  
e(rss) 
residual sum of squares  
e(rmse) 
root mean squared error  
e(chi2) 
chisquared  
e(r2_p) 
pseudoRsquared  
e(ll) 
loglikelihood  
e(ll_0) 
loglikelihood of fixedeffectonly regression  
e(N_clustervars) 
number of cluster variables; if vce() is set to use clustered standard errors 

e(N_clust #)

number of clusters in the #th cluster variable  
e(N_clust) 
number of clusters; minimum of all the e(clust#)  
e(ic) 
number of iterations  
e(ic2) 
number of iterations when partiallingout fixed effects  
e(converged) 
1 if converged, 0 otherwise 
Macros  
e(cmd) 
ppmlhdfe 

e(cmdline) 
command as typed  
e(separation) 
list methods used to detect and drop separated observations: fe , simplex , ir , and mu


e(dofmethod) 
dofmethod employed in the regression  
e(depvar) 
name of dependent variable  
e(indepvars) 
names of independent variables  
e(absvars) 
name of the absorbed variables or interactions  
e(extended_absvars) 
expanded absorbed variables or interactions  
e(title) 
title in estimation output  
e(clustvar) 
name of cluster variable  
e(clustvar #)

name of the #th cluster variable  
e(vce) 
vcetype specified in vce()


e(vcetype) 
title used to label Std. Err.  
e(chi2type) 
Wald ; type of model chisquared test 

e(offset) 
linear offset variable  
e(properties) 
b V 

e(predict) 
ppmlhdfe_p ; program used to implement predict


e(estat_cmd) 
reghdfe_estat ; program used to implement estat


e(marginsok) 
predictions allowed by margins


e(marginsnotok) 
predictions disallowed by margins


e(footnote) 
reghdfe_footnote ; program used to display the degreesoffreedom table 
Matrices  
e(b) 
coefficient vector  
e(V) 
variancecovariance matrix of the estimators  
e(dof_table) 
number of categories, redundant categories, and degreesoffreedom absorbed by each set of fixed effects 
Functions  
e(sample) 
marks estimation sample 