1 Introduction
The data of billions of individuals are currently being utilized for personalized advertising or
other online services.
1
The use and transaction of individual data are set to grow exponentially
in the coming years with more extensive data collection from new online apps and integrated
technologies such as the Internet of Things and with the more widespread applications of artificial
intelligence (AI) and machine learning techniques. Most economic analyses emphasize benefits
from the use and sharing of data because this permits better customization, better information,
and more input into AI applications. It is often claimed that because data enable a better allocation
of resources and more or higher quality innovation, the market mechanism generates too little data
sharing (e.g., Varian [2009], Jones and Tonetti [2018], Veldkamp et al. [2019], and Veldkamp [2019]).
Economists have recognized that consumers might have privacy concerns (e.g., Stigler [1980],
Posner [1981], and Varian [2009]), but have often argued that data markets could appropriately
balance privacy concerns and the social benefits of data (e.g., Laudon [1996] and Posner and Weyl
[2018]). In any case, the willingness of the majority of users to allow their data to be used for no
or very little direct benefits is argued to be evidence that most users place only a small value on
privacy.
2
This paper, in contrast, argues that there are forces that will make individual-level data under-
priced and the market economy generate too much data. The reason is simple: when an individual
shares her data, she compromises not only her own privacy but also the privacy of other individ-
uals whose information is correlated with hers. This negative externality tends to create excessive
data sharing. Moreover, when there is excessive data sharing, each individual will overlook her
privacy concerns and part with her own information because others’ sharing decisions will have
already revealed much about her.
Some of the issues we emphasize are highlighted by the Cambridge Analytica scandal. The
company acquired the private information of millions of individuals from data shared by 270,000
Facebook users who voluntarily downloaded an app for mapping their personality traits, called
“This is your digital life”. The app accessed users’ news feed, timeline, posts and messages, and
revealed information about other Facebook users. Cambridge Analytica was ultimately able to infer
valuable information about more than 50 million Facebook users, which it deployed for designing
personalized political messages and advertising in the Brexit referendum and 2016 US presidential
election.
3
Though some of the circumstances of this scandal are unique, the issues are general. For
example, when an individual shares information about his own behavior, habits and preferences,
this contains considerable information about the behavior, habits and preferences of not only his
friends but also other people with similar characteristics (e.g., the routines and choices of a highly-
educated gay from Central America in his early 20s in Somerville, Massachusetts is informative
1
Just Facebook has almost 2.5 billion monthly (active) individual users.
2
Consumers often report valuing privacy (e.g., Westin [1968]; Goldfarb and Tucker [2012]), but do not take much
action to protect their privacy (e.g., “Why your inbox is crammed full of privacy policies”, WIRED, May 24, 2018 and
Athey et al. [2017]).
3
See New York Times, March, 19 2018 and New York Times, March, 13, 2018, and The Guardian, April, 13, 2018
1