Findings from Big Data on Income Inequality and Income Uncertainty, by Fatih Guvenen
Fatih Guvenen is Professor of Economics at the University of Minnesota. His research is concerned with the income risk and the income distribution of households. Guvenen’s RePEc/IDEAS profile
My research program focuses on income inequality and income uncertainty, two economic phenomena that are distinct from each other, yet also closely related. Both inequality and uncertainty are central to a broad range of social issues. Many questions of current policy debate are inherently about their distributional consequences. For example, heated disagreements about major budget issues–such as reforming the healthcare or the Social Security system–often revolve around the distributional effects of such changes. Similarly, a crucial aspect of the policy debate on taxation is “who should pay what?” It is therefore not surprising that inequality and uncertainty have garnered significant attention from economists (and social scientists more broadly) as well as from policy makers and the broader public.
The substantial rise in income inequality in the United States starting in the late 1970s has given additional urgency to questions surrounding inequality. The vast academic literature that has developed to understand inequality has produced a wealth of insights and ideas about possible mechanisms and proposed a range of policy remedies. As is often the case, the same research also raised more questions and uncovered further puzzling facts that need to be explained themselves.
My research explores a set of interrelated questions in this broad area. Three key themes run through the different projects I work on: (i) using rich data sets (some of which have become recently available) from both administrative and public sources, (ii) emphasis on higher-order moments–skewness, kurtosis, and tail behavior–of the data, and (iii) working non-parametrically so as not to assume away important nonlinearities that may be present in the data. The substantive questions I explore can be categorized under three headings: 1. Short-term (business cycle) phenomena: a. Variation in income volatility (risk) over the business cycle b. Variation in firm volatility over the business cycle 2. Long-term trends: a. Long-run trends in inequality b. Long-run trends in inequality and mobility of top earners 3. Life-cycle trends a. Deviations from lognormality of earnings shocks b. Variation in the higher-order moments over the lifecycle c. Variation in higher-order moments across income groups
1. The Data Sets
One data set my coauthors and I have used in several of these projects comes from the Master Earnings File (MEF) of the U.S. Social Security Administration. The MEF is a population sample of all US individuals with a Social Security Number. It currently covers years 1978 to 2013 and contains information on labor earnings (salary and wage earnings from W-2 forms), employers (unique employer ID for each job held in a given year), and 4-digit SIC (industry) codes of the employer. We draw various subsamples from the MEF ranging from 1% to 10% of the US population. The substantial sample size of more than 600 million individual-year observations (in the 10% sample) allows us to employ a fully nonparametric approach and take what amounts to high-resolution pictures of individual earnings histories. The relaxation of parametric assumptions is a key part of this research agenda.
In addition, we also use data from Swedish and German administrative records (i.e., LINDA and IAB) as well as from various surveys (PSID for the United States and GSOEP for Germany) and firm-level datasets (Compustat Global, OSIRIS, and ORBIS) as explained below.
2. Business Cycle Variation in Risk
2.1 Earnings Risk
A central question in business cycle analysis concerns what happens to idiosyncratic risk in recessions. Two types of idiosyncratic shocks have received special attention: (i) individual earnings shocks, and (ii) firm-level shocks. The conventional wisdom is that both types of shocks become much larger in recessions, and this property was typically captured by a rise in the variance of such shocks. In a first set of papers, my coauthors and I revisit this conclusion, using new data and a fully non-parametric approach, and reach some surprising conclusions.
How Does Earnings Risk Vary Over the Business Cycle?
The conventional wisdom in the earnings dynamics literature has long been that earnings shocks have countercyclical variance–or equivalently, the variance of shocks becomes larger in recessions. While this view is consistent with the plausible idea that many individuals experience large negative shocks in recessions, it also implies, perhaps less plausibly, that, with a larger variance, many more individuals experience larger positive shocks in recessions than in expansions.
In Guvenen, Ozkan, and Song (2014), we have documented two sets of results. First, we revisited the question of counter-cyclicality and found that the variance of idiosyncratic earnings shocks is not countercyclical at all–in fact, it is virtually flat over the business cycle. Instead, it is the left-skewness of shocks that is strongly countercyclical: that is, during recessions, the upper end of the shock distribution collapses–large upward earnings increases become less likely–whereas the bottom end expands–large drops in earnings become more likely. Thus, while the dispersion of shocks does not increase, shocks become more left skewed and, hence, risky during recessions.
A second question we address in this paper is whether there are any observable characteristics that can be measured prior to the recession that predict a worker’s fortunes during a recession. This would represent a different kind of risk than the purely idiosyncratic kind that receives most of the attention in macro/labor work. This is because such risks can be thought of as a “factor structure” whereby an aggregate shock translates differently to workers with different characteristics. Because we have panel data on individuals we can construct observable variables based on the work history of each worker and see if they predict his fortunes during a recession (and similarly during an expansion). We found that one such variable–the 5-year average earnings of a worker immediately prior to a recession–strongly predicts how much the worker will suffer during the recession. For example, prime-age workers that enter a recession with high earnings suffer substantially less compared with those who enter with low earnings. During the Great Recession, workers who were at the 10th percentile before the recession lost 18% more in earnings than workers who were at the 90th percentile before the recession. Interestingly, the Great Recession was not unique: the 1980-83 double dip recession displayed just as strong a factor structure. This implies a large expansion of inequality during the recession that results from a predictable factor structure.
Although this pattern is monotonic between the 10th and 95th percentiles (i.e., higher pre-recession earnings, less earnings loss), this pattern reverses inside the top 5% and even more strongly inside the top 1%. For example, workers who entered the Great Recession in the top 1% (as of 2006) on average lost 30% of their income between 2007 and 2009. Furthermore, those in the top 0.1% as of 2006 lost 50% of their earnings between 2006 and 2011 (a much longer horizon). As surprising as this may sound, the Great Recession was not the most severe recession for very top earners: earnings losses for the top 1% and 0.1% was more severe during the 2000-2001 recession and just as bad during the 1989-1994 period. (Clearly, earnings do not include capital income, but do include bonuses, restricted stock units at time of vesting, and exercised stock options.)
The findings in this paper owe much to the non-parametric nature of the analysis, which reveals facts that could have been obscured or hidden with parametric formulations.
Social Insurance Policy
The analysis in the preceding paper raises as many questions as it answers. Two questions are especially pressing. First, are the facts regarding the business cycle variation in higher-order moments (the acyclicality of variance, the procyclicality of skewness, and the factor structure) specific to the United States, or is it more broadly a feature of business cycles in developed economies? Second, how robust are these results (i) to considering household earnings instead of male earnings (as was done in Guvenen-Ozkan-Song (2014)) and (ii) to the introduction of social insurance policies, in the form of unemployment benefits, welfare system, and the tax system?
To provide a broad perspective on these questions, in Busch, Domeij, Guvenen, and Madeira (2015) we study panel data on individuals and households from the United States, Germany, and Sweden, covering more than three decades of data for each country. The data for the U.S. is from the PSID, and for Germany and Sweden they come from IAB (admin), GSOEP (survey), and LINDA (admin), and include earnings information on households as well as detailed tax and benefits information.
The answer to the first question is that the cyclicality of higher-order risk is remarkably similar across these countries that differ in many details of their labor markets. In particular, in all three countries, the variance of earnings shocks is virtually constant over the business cycle, whereas the skewness becomes much more negative in recessions. (For some variables–such as female earnings in Sweden–we actually find the variance to be procyclical, not countercyclical. This happens because the top end of the shock distribution collapses more than the expansion in the bottom end. Skewness is procyclical in all such cases.) Perhaps surprisingly, the skewness of shocks is even more strongly procylical in Germany and Sweden compared with the United States. Therefore, the fundamental forces driving skewness over the cycle seem to be pervasive across developed economies.
Second, moving from individual earnings to household earnings makes only a small difference to the results. However, government provided insurance–in the form of unemployment insurance, welfare benefits, aid to low-income households, and the like–plays a more important role in reducing downside risk in all three countries; the effectiveness is weakest in the United States and strongest in Germany. We calculate that the welfare benefits of social insurance policies for stabilizing higher-order income risk over the business cycle range from 1% of annual consumption for the United States to 5% of annual consumption for Germany.
2.2 Idiosyncratic Firm-level Risk
Just as idiosyncratic shocks to earnings can affect individuals’ decisions, shocks to firm-level variables can affect (or reflect) firm-level choices and outcomes. With this in mind, a series of papers have investigated the business cycle variation in firm-level variables and have shown that they display countercyclical variance. For example, going back to at least Schwert (1989), it is well-known that stock returns become more volatile in recessions; an important paper by Bloom (2009) has drawn attention to the fact that firm sales and profit growth variance is countercyclical in the US data; Berger and Vavra (2013) have shown that product price dispersion is countercyclical, among others.
Given the evidence above about the strong cyclicality of skewness in earnings shocks, it seems natural to ask if firm-level variables also have cyclical third moments.
In Bloom, Guvenen, and Salgado (2015) we use various datasets from the United States and others to examine the business-cycle variation in the higher-order moments of the growth rates of firm-level variables (sales, profit, and employment). For US publicly listed firms, we use Compustat from 1962 to 2013, and for firms in other countries, we use Compustat Global, OSIRIS, and ORBIS, which contain very rich data on sales, employment, profits, and so on. For many developed countries, the data is quite comprehensive, going back to the 1980s, and including both public and private firms. For others, the time horizon is shorter and only public firms are included.
A robust finding across countries and firm-level variables is that skewness is, again, strongly procyclical. In fact, this pattern–of lower tail greatly expanding during recessions–is also the main driver behind the countercyclicality of variance. Overall, procyclical skewness of firm growth variables holds across a broader set of countries and time periods than the countercyclicality of variance—which is countercyclical in some countries, but is acyclical or even procyclical in some other countries and sub-periods. These results are robust to different selection criteria, across firm size categories, and across industries.
To summarize, the results of these three papers draw attention to fluctuations in skewness over the business cycle as a robust feature–much more so than fluctuations in variance, especially for earnings risk–that can drive fluctuations in uncertainty over the business cycle.
3 Long-Term Trends in Earning Inequality
3.1 Firming Up Inequality
Another set of questions is raised by long-run trends in earnings inequality. In particular, while it has been well documented that earnings inequality has increased rapidly in the United States over the last three decades, little is known about the role of employers (i.e., firms) in this trend. (Notable exceptions are Dunne et al (2004), Barth et al (2014), and Card et al (2014).)
To put this question in context, labor economists have considered a number of observable characteristics–such as education, gender, race, experience, and so on–and examined how much of the rise in inequality happened across groups of workers that differ in these characteristics. Using these variables, one can decompose the rise in inequality into their “proximate causes” so to speak: inequality could be rising because, say, the premium for skills, proxied by education, increased over time, leading to an increased gap between those with high skill and those with low skill. Alternatively, inequality could be increasing because, keeping the skill premium constant, the fraction of workers who are skilled has increased over time leading to more inequality. Similarly, because earnings inequality rises with age (i.e., more inequality among older workers than among younger workers), a shift in the labor force composition toward older workers will increase inequality. While these decompositions do not get to the fundamental determinants of rising inequality, they are useful in pointing to variables that are closely linked to those determinants.
One observable characteristic that has not received much attention is the employer that an individual works for. For example, we can ask: how much of the rise in earnings inequality can be attributed to rising dispersion between firms in the average wages they pay, and how much is due to rising wage dispersion within firms? Similarly, how did rising inequality affect the wage earnings of different types of workers working for the same employer–men vs. women, young vs. old, new hires vs. senior employees, and so on?
To address questions like these, in Song, Price, Guvenen, and Bloom (2014), we begin by constructing a matched employer-employee data set for the United States using administrative records. This is possible thanks to the fact that the MEF is a population sample and for each job it records the unique employer identification number. So we can use worker side information to construct firm data on total wages, wage distribution by worker characteristics (age distribution, gender composition), employment, and so on. Using this matched dataset of all U.S. firms between 1978 to 2012, we show that virtually all of the rise in earnings dispersion between workers is accounted for by increasing dispersion in average wages paid by the employers of these individuals. In contrast, pay differences within employers have remained virtually unchanged. Remarkably, this result has a fractal-like quality: it holds true within 4-digit industries, within different geographical regions, for firms of different size classes, and so on. In cases where we do find a change in within-firm wage inequality, it is almost always a (small) decline over time.
This finding may seem a bit surprising in the face of claims often made in the media that the rise in CEO and executive pay is driving the rise in inequality (See for example, Piketty (2013, pp. 315, 332), Mishel and Sabadish (2014), among others.) Given the nature of our matched dataset, we can zoom in on the top of the earnings distribution within a firm and see if these claims are borne out in the data. Perhaps surprisingly, we find that the wage gap between the most highly paid employees and the average employee in a firm has increased by only a small amount. Specifically, whereas the earnings of workers in the 99.99th percentile are five times higher today than in 1982, their earnings relative to the average worker in their firm are only 20% higher. The flip side is that the average pay at the employers of these top earning workers is four times higher today than in 1982. Hence, even at the very top of the earnings distribution, the vast majority of rising inequality has occurred between rather than within firms.
3.2 Inequality and Mobility at the Top
The MEF provides a unique opportunity to study top earners, thanks to its uncapped earnings records and its panel structure that allows us to track individuals over long periods of time. Some of the earlier work on top earners relies on tracking “the share of earnings accruing to the top x%” each year to circumvent the lack of panel data on these individuals. While such analyses can provide useful insights, the changing composition of top earners from year to year can affect conclusions in ways that may be hard to predict.
In Guvenen, Kaplan, and Song (2014) we analyzed changes in the gender structure at the top of the earnings distribution in the United States over the last 30 years. We found that although females still constitute a small proportion of the top percentiles, they have made sustained gains throughout this period. Therefore, while the glass ceiling remains, it is thinner than before. A large proportion of the increased share of females among top earners is accounted for by the mending of, what we refer to as, the paper floor — the phenomenon whereby female top earners were much more likely than male top earners to drop out of the top percentiles. More generally, membership in the top earnings groups has become more stable for both genders: entry and exit rates have declined and top earners have become a more entrenched group in the population.
In ongoing work, we are estimating some parsimonious stochastic processes for earnings just for top earners, with the aim of providing input into quantitative research on top earners.
4 Lifecycle Earnings Risk
Another strand of my research is concerned with the evolution of earnings over the life cycle. This year about 4 million young Americans will enter the U.S. labor market for the first time. In the subsequent 40 or so years, each one of these individuals will experience a unique adventure involving surprises (finding an attractive career, being offered a dream job, getting promotions and salary raises, and so on) as well as disappointments (failing in one career and moving on to another, experiencing job losses, suffering health shocks, and the like). Workers’ perceptions of these unforeseeable events, which constitute idiosyncratic risks to labor income, are central to many personal economic decisions, as these risks are hard to insure. Therefore, such risks also lie at the heart of numerous economic policy questions: what determines the inequality of consumption and wealth? How effective is fiscal policy in alleviating effects of recessions? And what is the optimal way to tax earnings? Addressing these questions thus requires a sound understanding of the nature of earnings risk over a career.
For the most part economists have been content with modeling earnings dynamics in a rather parsimonious fashion. This usually involves a low order autoregressive process (e.g., AR(1)), an i.i.d. shock, and Gaussian innovations. It is also assumed that the parameters governing these stochastic elements are constant over the life cycle and across the population. (Notable exceptions of course exist, but this description fits the typical calibration experiments.) This parsimony is often defended on the grounds that the panel data available to pin down the parameters of such processes (e.g., the PSID) is not rich enough to identify richer specifications and that more complex processes would introduce additional state variables into dynamic programming problems, making solutions much harder. While both points are correct, in the last few years, larger and richer panel datasets have become available, and with the increasing speed of computers the second issue is also becoming a less binding constraint.
In Guvenen, Karahan, Ozkan, and Song (2015), we pose and answer three questions about earnings dynamics over the life cycle. First, how good an approximation is lognormality for earnings shocks, a common assumption made out of convenience? Second, how do the properties of earnings shocks–especially deviations from lognormality–change over the life cycle? Third, how do these properties change across the population? In our paper we aim to answer these questions using no parametric assumptions on the distribution of income changes. Rather, we use robust and non-parametric statistics that are reported in the form of figures and tables. This visualization of the data allows us to see some very non-linear patterns easily. Some of these patterns would have been difficult to even predict beforehand, so imposing a priori parametric assumptions would have obscured the patterns documented in this paper.
One of the main findings is that changes over time in earnings (over both short and long time horizons) display large deviations from log-normality. In particular, relative to a normal distribution with the same median and standard deviation, the histogram of earnings changes in the data has a much sharper peak in the center, little mass on the shoulders (the region around +- 1 standard deviation), and long and thick tails. These three features of an empirical density are best summarized by its kurtosis. A common measure of kurtosis is the fourth standardized central moment of the distribution. The empirical distribution of one-year earnings growth has a kurtosis of 18, much higher than a normal distribution, which has a kurtosis of 3.
To provide a more familiar interpretation of these kurtosis values, we calculate measures of concentration. If the data were drawn from a normal density, only 8 percent of individuals would experience an annual change in earnings of less than 5 percent (of either sign). Theis corresponding number in the data is 35 percent, showing a much higher concentration of earnings changes near zero. Furthermore, the probability that a worker will receive a very large shock (an fivefold increase or an 80 percent drop) is 12 times higher in the data than under log-normality. To put it differently, in a given year, most individuals experience very small earnings shocks, and a small but non-negligible number experience very large shocks.
Moreover, the average kurtosis masks significant heterogeneity across individuals by age and level of earnings, increasing with age and earnings: prime-aged males with recent earnings of $100,000 (in 2005 dollars) face earnings shocks with a kurtosis above 30, whereas young workers with recent earnings of $10,000 face a kurtosis of only 4.
A second important deviation from log-normality is that the distribution of earnings shocks is not symmetric: it displays large negative skewness. Specifically, large downward movements in earnings (disaster shocks) are more likely than large upward swings. Furthermore, shocks become more negatively skewed with higher earnings and with age. This worsening is due entirely to the fact that large upside earnings moves become less likely from age 25 to 45 and to the increasing disaster risk after age 45.
What do these deviations from lognormality mean for analyses of risk? A back-of-the-envelope calculation gives some idea. Consider the well-known thought experiment (Arrow (1965), Pratt (1964)) in which an individual is indifferent between (i) a gamble that changes his consumption level by a random proportion (1+δ), and (ii) a fixed payment π, the risk premium, to avoid the gamble. Let us compare two scenarios for the standard constant relative risk aversion utility function with a curvature of 10. In the first one, δ is drawn from a Gaussian distribution with zero mean and a standard deviation of 0.10. In the second, δ has the same mean and standard deviation but has a skewness coefficient of -2 and a kurtosis of 30 (consistent with our empirical findings for a 45 year old male earnings 100,000 in the previous year). An individual would be willing to pay 22.1 percent of his average consumption to avoid the non-normal bet compared to about 4.9 percent for the normal, an amplification of risk aversion of 450 percent.
While this example is only intended to be suggestive, recent papers have found important effects of these deviations from lognormality in more realistic settings. For example, Constantinides and Ghosh (2014) show that an incomplete markets asset-pricing model with countercyclical (negative) skewness shocks generates plausible asset pricing implications. Schmidt (2015) goes one step further and considers both negative skewness and thick tails (targeting the moments documented in Guvenen-Ozkan-Song (2014)) and finds that the resulting model provides a plausible set of predictions for asset prices. Finally, turning to fiscal policy, Golosov, Troshkin and Tsyvinski (2014) show that using an earnings process with negative skewness and excess kurtosis implies a marginal tax rate on labor earnings for top earners that is substantially higher than under a traditional calibration with Gaussian shocks with the same variance.Higher-order moments are gaining a more prominent place in recent work in monetary economics (e.g., Midrigan (2011) and Berger and Vavra (2011); see Nakamura and Steinsson (2013) for a survey) as well as in the firm dynamics literature (e.g., Bloom et al (2011) and Bachmann and Bayer(2014)).
To summarize, a broader message of this paper is a call for researchers to reconsider the standard approach in the literature to studying earnings dynamics. The covariance matrix approach that dominates current work (whereby the variance-covariance matrix of earnings changes are the only set of moments considered in pinning down parameters) is too opaque and a bit mysterious: it is difficult to judge the economic implications of matching or missing certain covariances. Furthermore, the standard model in the literature assumes lognormal shocks, whereas this analysis find large deviations from log-normality, in the form of very high kurtosis and negative skewness. With the increasing availability of very large panel data sets, I believe that researchers’ priority in choosing methods needs to shift from efficiency concerns to transparency. The approach adopted here is an example of the latter, and we believe it allows economists to be better judges of what each moment implies for the economic questions they have at hand.
Arrow, Kenneth, 1965. Aspects of the Theory of Risk Bearing, Yrjö Jahnsson lectures, Yrjo Jahnssonin Saatio, Helsinki.
Bachmann, Rüdiger, and Christian Bayer, 2014. “Investment Dispersion and the Business Cycle,” American Economic Review, vol. 104(4), pages 1392-1416, April.
Barth, Erling, Alex Bryson, James C. Davis, and Richard Freeman, 2014. “It’s Where You Work: Increases in Earnings Dispersion across Establishments and Individuals in the U.S.,” NBER Working Paper 20447, September.
Berger, David and Joseph Vavra, 2011. “Dynamics of the U.S. Price Distribution,” Working Paper, Yale University 2011.
Bloom, Nicholas, 2009. “The Impact of Uncertainty Shocks,” Econometrica, vol. 77 (3), pages 623-685.
Bloom, Nicholas, Fatih Guvenen, and Sergio Salgado, 2015. “Firms over the Business Cycle: Fluctuations in Higher-Order Uncertainty,” Working Paper, University of Minnesota.
Busch, Christopher, David Domeij, Fatih Guvenen, and Rocio Madera, 2015. “Higher-Order Income Risk and Social Insurance Policy Over the Business Cycle,” Working Paper, University of Minnesota.
Card, David, Jörg Heining, and Patrick Kline, 2013. “Workplace Heterogeneity and the Rise of West German Wage Inequality,” The Quarterly Journal of Economics, vol. 128 (3), pages 967-1015.
Constantinides, George M. and Anisha Ghosh, 2014. “Asset Pricing with Counter-cyclical Household Consumption Risk,” Working Paper, University of Chicago.
Dunne, Timothy, Lucia Foster, John Haltiwanger, and Kenneth R. Troske, 2004. “Wage and Productivity Dispersion in United States Manufacturing: The Role of Computer Investment,” Journal of Labor Economics, vol. 22 (2), pages 397-430, April.
Golosov, Michael, Maxim Troshkin, and Aleh Tsyvinski, 2014. “Redistribution and Social Insurance,” Working Paper, Princeton University.
Guvenen, Fatih, Fatih Karahan, Serdar Ozkan, and Jae Song, 2014. “What Do Data on Millions of U.S. Workers Say About Labor Income Risk?” Working Paper, University of Minnesota.
Guvenen, Fatih, Greg Kaplan, and Jae Song, 2014. “The Glass Ceiling and The Paper Floor: Gender Differences Among Top Earners, 1981-2012,” Working Paper 716, Federal Reserve Bank of Minneapolis.
Guvenen, Fatih, Serdar Ozkan, and Jae Song, 2014. “The Nature of Countercyclical Income Risk,” Journal of Political Economy, vol. 122 (3), pages 621-660.
Midrigan, Virgiliu, 2011. “Menu Costs, Multiproduct Firms, and Aggregate Fluctuations,” Econometrica, vol. 79(4), pages 1139-1180, July.
Mishel, Lawrence, and Natalie Sabadish, 2014. “CEO Pay and the top 1%: How executive compensation and financial-sector pay have fueled income inequality,” EPI Issue Brief 331, Economic Policy Institute, May.
Nakamura, Emi, and Jón Steinsson, 2013. “Price Rigidity: Microeconomic Evidence and Macroeconomic Implications,” Annual Review of Economics, vol. 5(1), pages 133-163, May.
Piketty, Thomas, 2013. Capital in the Twenty-First Century, Harvard University Press.
Pratt, John W., 1964. “Risk Aversion in the Small and in the Large,” Econometrica, vol. 32 (1/2), pages 122-136.
Schmidt, Lawrence, 2015. “Climbing and Falling Off the Ladder: Asset Pricing Implications of Labor Market Event Risk,” Working Paper, University of California at San Diego.
Schwert, G. William, 1989. “Why Does Stock Market Volatility Change over Time?,” Journal of Finance, vol. 44 (5), pages 1115-53.
Song, Jae, David Price, Fatih Guvenen, and Nicholas Bloom, 2015. “Firming Up Inequality,” Research Mimeo, University of Minnesota.