Laura Veldkamp on Modeling and Measuring the Data Economy
Laura Veldkamp is a Professor of Finance and Economics at Columbia University’s Graduate School of Business and a former co-editor of the Journal of Economic Theory. Her research research focuses on how individuals, investors and firms get their information, how that information affects the decisions they make, and how those decisions affect the macroeconomy and asset prices. Veldkamp’s RePEc/IDEAS profile.
The five largest American companies derive most of their value not from physical assets, but from intangibles ones, like data. Data and new data technologies are changing production, labor and valuation. Does this transformation from an industrial economy to a data economy bring with it new economics? Is the accumulation of data contributing to big firms getting bigger (Davis and Haltiwanger, 2019)? Can changes in data technology explain some of the decline in the labor share (Karabarbounis and Neiman, 2013)? How should we measure or value data when it is often traded at zero price, bartered for a user’s access to a digital service? Finally, can the accumulation of data be an engine of growth? This research agenda deploys information economics tools to understand the data macroeconomy, derives new approaches to measuring data and develops a standard, recursive macroeconomic framework that can be used for data policy evaluation.
The industrialization of knowledge production
New big data technologies are like the industrialization of knowledge production. To quantitatively investigate the veracity of this claim, Simona Abis and I make use of novel labor market data in “The Changing Economics of Knowledge Production.”. What does industrialization mean in this context? Traditional industrialization involved adopting new production technologies that had a different factor mix. Labor-intensive artisanal production was replaced with capital-intensive industrial production. Formally, economists typically represent such a shift as a change in the exponents of the Cobb-Douglas production function. What induces a more capital-intensive mode of production is a higher exponent on capital. To investigate changes in knowledge production, we take a parallel approach, replacing capital with data. Concretely, we estimate the parameters of a knowledge production function that uses data and labor as inputs. We do this separately for firms using old data technology and for firms using a suite of new data technologies that can make use of more data, or a greater variety of data, than before. For short, we refer to these new technologies as “AI.” The exponent of a Cobb-Douglas production function on labor represents the share of income paid to labor. Therefore, we can use hiring and wages to estimate the data analyst labor share of income, for each of the two technologies.
Using textual analysis tools, Burning Glass job-posting data, and the descriptions of job skill requirements, we classify financial analysis jobs as “old tech” or “AI.” Using these job postings, and adjusting for separations and vacancy filling rates, we cumulate hires to form stocks of each type of labor for each firm in our sample.
The next challenge is to infer how much data a firm owns or uses. Luckily, firms leave clear evidence of their data accumulation in their data management hiring. Data needs to be organized, cleaned and integrated into a firm’s database, before it can add value. Even firms that buy data need workers to re-structure it to fit their systems and their needs. Therefore, we impute data stocks by estimating a production function that takes data management labor as an input and produces additions to the stock of structured data as an output. Thus, for each firm, in each month, we have first-order conditions that relate firms’ wage and hiring choices to their initial conditions and production parameters. These moments (over-)identify our production function exponents.
We find that the exponent on data in the production function has risen from 0.56 to 0.73, in the finance sector. That means that there are higher marginal and more slowly diminishing returns to data. Equivalently, analysts’ labor share of income has fallen from 0.44 to 0.27. New data technology has pushed the labor share of knowledge workers down by 17% points.
So far, we restrict our analysis to finance sector jobs because that industry is an early AI adopter and because we have a vocabulary to describe various types of work with data. But our procedure can be applied to other sectors. One could also relax the Cobb-Douglas assumption. Finally, a similar exercise in the research sector might reveal how data interacts with R&D, a key question for long-run growth.
Data as fuel for long-run growth
If new data technologies are increasing knowledge production, can this fuel long-run growth? We have long known that knowledge drives long-run economic growth. If data produces knowledge, data accumulation must be an engine of long-run growth as well, right?
“A Growth Model of the Data Economy,” joint work with Maryam Farboodi, disputes this world view. The new data technologies are, at their heart, prediction technologies. More data helps us predict unknown outcomes more accurately. But crucially, prediction errors have a natural lower bound. They cannot fall below zero. Holding all else constant, this bounds the benefits to data. Bounded benefits preclude long-run growth.
To see why, consider the alternative. If data alone can fuel long-run growth, then infinite data would have to achieve a zero prediction error, which in turn, must produce infinite real output. Let’s consider that first condition. If infinite data can reduce prediction errors to zero, it means that everything is predictable, with enough data. That view implies that there is no fundamental randomness. In short, all relevant economic events would have to be deterministic and spanned by the set of events we have already seen. In that world view, data is like a crystal ball. If no such data crystal ball exists, then infinite data can only reduce forecast errors to a finite lower bound. Forecasts with errors cannot yield infinite value. Thus, there must be an upper bound to the value data can produce. The second condition, that perfect foresight produces infinite real output, is equally problematic.
Motivated by these insights, we build a model where data raises productivity, by improving firms’ forecasts of an optimal production style or technique. Firms accumulate this data, as a by-product of economic transactions. The more they produce, the more data they get. Since the optimal action changes over time, new data is more relevant than old data. The rate of change of the forecasting target determines the deprecation rate of data. Thus, data is a stock, like capital. It accumulates with inflows, but also depreciates. Like capital, data has diminishing marginal returns: After millions of data points predict something precisely, another million offer little improvement. Therefore, like capital, data accumulation alone cannot sustain growth.
Data lends itself to barter. One key difference between data and capital is that, unlike capital, data is a by-product of economic activity. Firms can reap data and sell all their goods for consumption. At the same time, the only way a firm can produce more data is to produce more goods. Therefore, firms might be willing to sell at prices lower than marginal cost because of the additional data they gain. This motive can rationalize the data barter we see in reality: Many digital goods and services (apps) are costly to develop but are offered at zero price. Why would any firm do this? Because the user trades their data for permission to use the digital service. This is a form of costly investment in data capital.
There is also a market for non-rival data. A firm can buy and sell data. What keeps a firm from selling all their data is the assumption that once data is sold it has a higher depreciation rate (reflecting that it is no longer private information). Firm growth takes on an S-shaped dynamic. Growth starts slow, accelerates when firms produce and buy, and then falls to zero as the firm converges to its data steady-state. This growth dynamic looks like a growth model with a poverty trap. At low levels of data, firms produce low-quality goods, have few transactions, and therefore accumulate little data.
S-shaped firm dynamics also generate dispersion, then convergence in firm size. When multiple firms start with different initial data stocks, the high-data firms grow faster and accumulate additional data faster. This increases the dispersion in firms’ data stock and dispersion in firm output and earnings. This dynamic qualitatively resembles this increase in firm size dispersion we see in U.S. data, with superstar, data-intensive firms pulling away from the rest. However, the concave part of the S-shaped growth curve predicts convergence. If the growing dispersion in firm size is due to data, it may be a transitory phenomenon, not a permanent feature of the economic landscape.
The fact that the largest 5 U.S. companies are data companies and that they are systematically acquiring rivals, suggests that data and competition may well be related. Future work in this area could introduce imperfect competition or market power in a variety of ways. Another future direction could be to examine the possibility that data facilitates growth, not alone, but in conjunction with idea creation.
Data, firm size and investors’ changing trading strategies
Is data contributing to big firms getting bigger? There are a few different ways in which this force might operate. One of those mechanisms works through financial markets. Finance is one of the top-three industries with the most widespread adoption of new data technologies (Acemoglu, Autor and Hazell, 2019). In “Big data in finance and the growth of large firms”, Juliane Begenau, Maryam Farboodi and I show how old firms, which are systematically bigger, can grow even larger, because of new data technologies. Big firms are typically old firms. Old firms have long histories and therefore have more data available to analyze. Because of their long data histories, these large, old firms are valuable targets for data analysis.
Investors use data to reduce uncertainty about a firm. Less uncertain firm cash flows reduce the cost of capital. A lower cost of capital makes firms’ real investments less costly and facilitates growth. Thus, if big, old firms have more available data, which is more eagerly analyzed by the financial sector, and this data reduces investment costs for these firms, which enables them to grow bigger, then financial data is one piece of the divergence in firm size.
Of course, one might rightly object that financial market participants don’t seem so focused on data about firms’ fundamental value. Instead, financial analysts are crunching data on others’ order flow, trying to extract from those orders what others know. In “Long-Run Growth of Financial Technology,” Maryam Farboodi and I explore investors’ choice of whether to learn about firm fundamentals, or about the trades of others. We find that as technology improves, traders shift to mining order flow data, but that this activity still benefits firms, in the same way that learning about fundamentals does.
Measuring and Valuing Data in Market Prices
One of the most important challenges to data macroeconomics is the measurement challenge. How much data is there and what is its value? It’s easy enough to get the price of one data set, to study one firm, or even to collect prices of many transactions. But figuring out much data is present in the aggregate, for a whole economy, requires different tools. In “Where Has All the Data Gone?” Maryam Farboodi, Adrien Matray, Venky Venkateswaran and I use an equilibrium model to estimate how much total data equity traders are using across assets. Then we value this data for each type of asset.
When more data is used to forecast firms’ profit and investors invest according to their forecasts, asset prices should covary more with future firm profits. That covariance is also called “price informativeness” (Bai, Philippon and Savov, 2016). Of course, that covariance can also be affected by firm size, growth, volatility and other factors. Therefore, we estimate size, variance and a sufficient statistic for market conditions, and then use our estimated model to tease out the information part of that covariance.
Our main finding is data divergence: The rise in data processing is directed primarily at large, growth firms. Data about other firms appears to be no more abundant now than it was decades ago. Our structural estimation model suggests a potential reason for the divergence: Large firms are growing larger and growth amplifies the cross-sectional difference in firm size. Both size and growth make data acquisition about a firm more valuable.
The size effect comes from returns to scale in data. The average investor needs to hold a larger position in big firms for the big firm’s equity market to clear. One piece of data can be used to forecast the payoff of one share in an investor’s portfolio, and thus produce a small benefit. That same piece of data could be used to forecast the payoff of an asset the investor expects to hold lots of. The more of the asset the investor expects to hold, the greater the value of using data to hold the optimal amount. Therefore, data about big firms is especially valuable. Growing firms have more valuable data because growth increases the sensitivity of firm value to information.
Perhaps financial markets are special? The general principle of using the covariance between actions and the future realization of the forecasted state could be used to measure data in many other economic contexts.
Lots of data is collected, not by producers or investors, but by intermediaries who harvest the transactions data. Work in operations technology and in applied micro has studied platform strategies and competition. But we have few tools to think about aggregate equilibrium effects of data sharing and optimal regulation, in settings with uncertainty. “Taking Orders and Taking Notes,” joint work with Nina Boyarchenko and David Lucca, offers some tools to do this. While the paper is about primary dealers sharing information in treasury auctions, the framework is applicable to any setting where intermediaries observe customers’ orders and use that aggregated information to inform all customers about the value or quality of the item being sold. In effect, customers are bartering their data for aggregate advice, like product ratings or customer reviews. We find that such information-barter benefits the seller, who receives more money because the quality of their good is less uncertain to the customer. Reducing uncertainty is often good for buyers as well, but not always. Information-barter can create a prisoners’ dilemma problem for consumers, where all prefer individually to share, but make themselves collectively worse off when prices rise. Future work could use this framework to study the social value of data intermediaries, as well as the effects of data platforms on sellers’ choices of what goods to produce.
The digital macroeconomy is a new area, wide open for researchers to make novel and important contributions. Questions about pricing, externalities, growth, inequality, firm competition, labor demand, changes in matching efficiency, GDP measurement, price stickiness, all are affected by the new ways in which firms use data. We have begun to develop tools for this endeavor. But it will require many researchers using these tools and others, for us to have a good sense of how data is transforming the macroeconomic landscape .
Abis, Simona , and Laura Veldkamp, 2020. “The Changing Economics of Knowledge Production”.
Acemoglu, Daron, David Autor, and Jonathon Hazell, 2019. “AI and Jobs:Evidence from Online Vacancies”, MIT Working Paper.
Bai, Jennie, Thomas Philippon, and Alexi Savov, 2016. “Have financial markets become more informative?”, Journal of Financial Economics, 122(3), 625-654.
Begenau, Juliane, Maryam Farboodi, and Laura Veldkamp, 2018. “Big data in finance and the growth of large firms”, Journal of Monetary Economics, 97, 71-87.
Boyarchenko, Nina, David Lucca, and Laura Veldkamp, 2019. “Taking Orders and Taking Notes: Dealer Information Sharing in Treasury Markets,” National Bureau of Economic Research Working Paper 22461.
Davis, Steven, and John Haltiwanger, 2019. “Dynamism Diminished: The Role of Housing Markets and Credit Conditions”, National Bureau of Economic Research Working Paper 25466.
Farboodi, Maryam, Adrien Matray, and Laura Veldkamp, 2018. “Where has all the big data gone?”, Available at SSRN 3164360.
Farboodi, Maryam and Laura Veldkamp, 2020 “Long run growth of financial data technology”, forthcoming, American Economic Review.
Farboodi, Maryam, and Laura Veldkamp, 2019. “A Growth Model of the Data Economy”.
With the spread of coronavirus, we have been struggling to make forecasts about what the summer will bring. In fact, we delayed the mailing of decisions for the annual conference until April 15 hoping that we would have better information. We do. Travel bans and shelter-in-place orders will not be lifted in time for us to hold the annual conference at the originally scheduled time, which this year was to be June 21-23 in Barcelona Spain.
After consultations with the local organizers— Jordi Caballe, Joan Llull, Albert Marcet, and Raul Santaeulalia-Llopis—and program co-chairs—Doireann Fitzgerald and Nir Jaimovich—we have decided to reschedule the conference for June 2021. Specific dates will be announced soon once facilities at the Universitat Autonoma de Barcelona campus have been booked. Doireann and Nir will notify authors of accepted papers as planned on April 15, 2020. They have put together a strong program and, given that most seminars and conferences have been cancelled for spring and summer, many of the papers chosen will not have been presented that much during the year. We are currently working on the timeline for registering and uploading new versions of papers if necessary.
We are also planning to expand the meeting next year and will have a second call for papers with a February 15, 2021 deadline. For those that were not accepted in the first round, there will be an opportunity to try again. For those on the job market this fall with new papers, a year will not be lost.
I want to thank all of the organizers for their help and patience. I want to also thank the future organizers for their flexibility. We are now making plans to shift forward our original plans for the 2021 and 2022 summer meetings that were to take place in Taipei and Cartagena. Ping Wang of Washington University in St. Louis and hosts at the Institute of Economics at the Academia Sinica were originally scheduled for the summer meeting of 2021; David Perez-Reyna and hosts at the Universidad de los Andes were scheduled for 2022. We are now working on plans to host these meetings in 2022 and 2023 although we do not know at this point the dates or the ordering of cities. With so many cancellations, schedules have to be redone and conferences of our size require significant lead time.
In other news, we had three successful sessions at the 2020 ASSA Meetings in San Diego this past January. I want to thank Larry Christiano for overseeing our winter meeting sessions and Alessandro Dovis, Gerad Padro i Miquel, and Thomas Winberry for soliciting great papers.
Finally, I owe much gratitude to the SED Secretary, Marina Azzimonti and SED Treasurer, Erwan Quintin, who have been in on all of this year’s ups and downs.
I encourage everyone to check out the SED website and twitter account for updates on next summer.
Thanks to everyone who submitted to the call for papers for the 2020 SED. The standard of submissions was exceptionally high. After a lot of hard work from the program committee, we are happy to be able to accept 540 great papers. Conference registration for these accepted participants will open in early 2021, and we will also make arrangements for participants to upload updated versions of their papers. We look forward to seeing these papers in Barcelona in June 2021.
Since we will be able to expand the conference in 2021 over what was planned in 2020, there will be a new call for papers. This will be open to anyone whose paper is not accepted from the 2020 call, including new PhDs who might not have been ready to submit this year. The deadline for this new call for papers will be 15th February 2021. Acceptances will be made on 15th March, and the registration period for this set of papers will follow the SED norm.