United States National Bank Balance Sheets (1867-1904)
By Sergio Correia and Stephan Luck
Starting on 1867, the Office of the Comptroller of the Currency has prepared Annual Reports to the U.S. Congress with the balance sheets of all U.S. National Banks (equivalent to our current Call Reports). Below we provide a brief explanation to the data, as well as download links for the data and documentation. Please also see our [Liberty Street Economics] blogpost, where we provide a more detailed explanation of the data.
Jump to: Introduction | Guide | Download | How to cite
1. What are call reports?
Call Reports are a set of regulatory filings submitted by commercial banks on their financial condition including their balance sheet and income statements. They are one of the most essential data sources used in banking and finance, and have existed in various forms and iterations since the onset of the national banking system in the 1860s. Digitized data, however, is only readily accessible to researchers for the recent past. For instance, the most recent Call Report iteration (FFIEC forms 031, 041, and 051) is only available from 1984 onward. However, the underlying data exists in various forms for many more years.
2. Data source
The original source is the Office of the Comptroller of the Currency (OCC), which published national bank balance sheets in the appendix of its Annual Report to Congress (see an example of such a balance sheet below). While not as detailed as contemporary regulatory filings such as FFIEC 031 (for commercial banks) or FR Y-9C (for bank holding companies), the OCC’s Call Reports were surprisingly granular throughout this period. For instance, on the asset side it asked banks to report their amount of outstanding loans and discounts, their holdings of cash and governments bonds, and how much credit is provided to other banks via the interbank market. On the liability side, it included the different types of equity held by shareholders, outstanding deposits, and the amount of national bank notes issued by the bank. The OCC’s Call Report also documented each bank’s location, the identity of its president and cashier, and assigned unique identifiers to each bank, known as charter numbers, which can be used to construct a panel dataset.
Additional variables
- Bank events: changes to bank charter numbers, voluntary liquidations, receiverships, and other materially important events. We digitize these data from tables in the OCC Annual Reports.
- Standardized bank president and bank cashier names, thanks to the work of the Society of Paper Money Collectors (SPMC) Bank Note History Project.
- City-level variables: we validate city names against the USGS Geographic Names Information System (GNIS) data of historic and current cities, geolocating about 99.9% of all observations. We further add city-level information for all geolocated cities, including latitude, longitude, city founding year, population at each decennial census, etc. To improve accuracy, we cross-reference several sources:
- Data dump of all Wikipedia pages, which include most U.S. cities and towns. Note: we used Ben Schmidt’s earlier work as a basis for this approach.
- Dataset collected and shared by Jacob Alperin-Sheriff, based on digitized U.S. Decennial Census data.
- Data collected and shared by James Feigennbaum, based on digitized U.S. Decennial Census data.
- Center for Spatial and Textual Analysis (CESTA) dataset for medium and large U.S. cities.
- Hand-collected data from the U.S. Decennial Census, in case of discrepancies between the sources above.
Standardization/harmonization of data across years
The structure of the bank balance sheets produced by the OCC changed with time, as items were added or removed from the tables. For instance:
- The item “THREE PERCENT CERTIFICATES” was added to assets in 1868 and removed in 1873.
- The item “DUE FROM OTHER BANKS AND BANKERS” was split into “DUE FROM OTHER NATIONAL BANKS” and “DUE FROM STATE BANKS” in 1889.
- The liability item for paid-in capital with disbursement not yet certified had multiple names across years: “CAPITAL STOCK UNCERTIFIED”, “CAPITAL STOCK NOT CERTIFIED”, “CAPITAL STOCK PAID IN UNCERTIFIED”, etc.
To construct a panel data table that is comparable across years, we thus have to standardize variable names. For instance, in the last example above we combine all variants into a variable named capital_not_certif
. The list of all the rules applied are available in the documentation file below.
Additionally, in rare occasions ad-hoc items were added. For instance, one bank in 1904 had an account for “HORSES AND BUGGIES ON HAND”. We add these ad-hoc items into the other_assets
and other_liabs
variables.
3. Download
There are two options to download the data:
- [Download data (Stata 16 .dta; compressed by 7zip)]
- [Download data (Tab-separated text; compressed by 7zip)]
- [Zipped file with documentation]: contains YAML files with the rules applied to create each variable, plus a Stata do-file with the variable labels used to create the Stata .dta file.
4. Citation
if you use this database in a paper or project, please cite its associated papers:
Carlson, Mark, Sergio Correia, and Stephan Luck. “The effects of banking competition on growth and financial stability: Evidence from the national banking era.” Journal of Political Economy 130, no. 2 (2022): 462-520.
Correia, Sergio and Luck, Stephan. “Digitizing Historical Balance Sheet Data: A Practitioner’s Guide”. Explorations in Economic History 87 (2023): 101475.
Their BibTeX entries are:
@article{BankingCompetition,
author = {Carlson, Mark and Correia, Sergio and Luck, Stephan},
title = {The Effects of Banking Competition on Growth and Financial Stability: Evidence from the National Banking Era},
journal = {Journal of Political Economy},
volume = {130},
number = {2},
pages = {462-520},
year = {2022},
doi = {10.1086/717453}
}
@article{DigitizingData,
title = {Digitizing historical balance sheet data: A practitioner's guide},
journal = {Explorations in Economic History},
volume = {87},
pages = {101475},
year = {2023},
issn = {0014-4983},
doi = {https://doi.org/10.1016/j.eeh.2022.101475},
url = {https://www.sciencedirect.com/science/article/pii/S0014498322000535},
author = {Sergio Correia and Stephan Luck},
keywords = {OCR, Data extraction, Balance sheets}
}
Users might find it useful to review the data section of the first paper for more details on this data, and the introduction section of the second paper for an overview of the digitization process employed, including a discussion of how OCR errors were handled.
5. Ackwnoledgements
This dataset wouldn’t have been possible without help and advice from Peter Huntoon, Mark Drengson, Andrew Pollock, James Feigenbaum, Jacob Alperin-Sheriff, and Eugene White, as well as the encouragement of many other colleagues.
6. Bonus: how to digitize your own datasets…
Thanks to improvements in machine learning and OCR techniques, the approaches we used to digitize the data are much less daunting than what most researchers expect. We definitely encourage researchers to try it for their own projects, and we have made our code is available as a Python package here. See also our Explorations in Economic History article (arXiv link for an introduction.