Too Much Data: Prices and Inefficiencies in Data Markets

Too Much Data: Prices and Inefﬁciencies in Data Markets

∗

Daron Acemoglu

†

Ali Makhdoumi

‡

Azarakhsh Malekian

Asu Ozdaglar

Abstract

When a user shares her data with an online platform, she typically reveals relevant information

about other users. We model a data market in the presence of this type of externality in a setup

where one or multiple platforms estimate a user’s type with data they acquire from all users and

(some) users value their privacy. We demonstrate that the data externalities depress the price of

data because once a user’s information is leaked by others, she has less reason to protect her data

and privacy. These depressed prices lead to excessive data sharing. We characterize conditions

under which shutting down data markets improves (utilitarian) welfare. Competition between

platforms does not redress the problem of excessively low price for data and too much data shar-

ing, and may further reduce welfare. We propose a scheme based on mediated data-sharing that

improves efﬁciency.

Keywords: data, informational externalities, online markets, privacy.

JEL Classiﬁcation: D62, L86, D83.

∗

We are grateful to Alessandro Bonatti and Hal Varian for useful conversations and comments. We gratefully ac-

knowledge ﬁnancial support from Google, Microsoft, the National Science Foundation, and the Toulouse Network on

Information Technology.

†

Department of Economics, Massachusetts Institute of Technology, Cambridge, MA, 02139 . [email protected]

‡

Fuqua School of Business, Duke University, Durham, NC, 27708 [email protected]

Rotman School of Management, University of Toronto, ON, M5S 3E6 [email protected]

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge,

MA, 02139 [email protected]

1 Introduction

The data of billions of individuals are currently being utilized for personalized advertising or

other online services.

The use and transaction of individual data are set to grow exponentially

in the coming years with more extensive data collection from new online apps and integrated

technologies such as the Internet of Things and with the more widespread applications of artiﬁcial

intelligence (AI) and machine learning techniques. Most economic analyses emphasize beneﬁts

from the use and sharing of data because this permits better customization, better information,

and more input into AI applications. It is often claimed that because data enable a better allocation

of resources and more or higher quality innovation, the market mechanism generates too little data

sharing (e.g., Varian [2009], Jones and Tonetti [2018], Veldkamp et al. [2019], and Veldkamp [2019]).

Economists have recognized that consumers might have privacy concerns (e.g., Stigler [1980],

Posner [1981], and Varian [2009]), but have often argued that data markets could appropriately

balance privacy concerns and the social beneﬁts of data (e.g., Laudon [1996] and Posner and Weyl

[2018]). In any case, the willingness of the majority of users to allow their data to be used for no

or very little direct beneﬁts is argued to be evidence that most users place only a small value on

privacy.

This paper, in contrast, argues that there are forces that will make individual-level data under-

priced and the market economy generate too much data. The reason is simple: when an individual

shares her data, she compromises not only her own privacy but also the privacy of other individ-

uals whose information is correlated with hers. This negative externality tends to create excessive

data sharing. Moreover, when there is excessive data sharing, each individual will overlook her

privacy concerns and part with her own information because others’ sharing decisions will have

already revealed much about her.

Some of the issues we emphasize are highlighted by the Cambridge Analytica scandal. The

company acquired the private information of millions of individuals from data shared by 270,000

Facebook users who voluntarily downloaded an app for mapping their personality traits, called

“This is your digital life”. The app accessed users’ news feed, timeline, posts and messages, and

revealed information about other Facebook users. Cambridge Analytica was ultimately able to infer

valuable information about more than 50 million Facebook users, which it deployed for designing

personalized political messages and advertising in the Brexit referendum and 2016 US presidential

election.

Though some of the circumstances of this scandal are unique, the issues are general. For

example, when an individual shares information about his own behavior, habits and preferences,

this contains considerable information about the behavior, habits and preferences of not only his

friends but also other people with similar characteristics (e.g., the routines and choices of a highly-

educated gay from Central America in his early 20s in Somerville, Massachusetts is informative

Just Facebook has almost 2.5 billion monthly (active) individual users.

Consumers often report valuing privacy (e.g., Westin [1968]; Goldfarb and Tucker [2012]), but do not take much

action to protect their privacy (e.g., “Why your inbox is crammed full of privacy policies”, WIRED, May 24, 2018 and

Athey et al. [2017]).

See New York Times, March, 19 2018 and New York Times, March, 13, 2018, and The Guardian, April, 13, 2018

about others with the same proﬁle and residing in the same area).

The following example illustrates the nature of the problem, introduces some of our key con-

cepts, and clariﬁes why there will be excessive data sharing and very little willingness to protect

privacy on the part of users.

Consider a platform with two users, i = 1, 2. Each user owns her own personal data, which

we represent with a random variable X

(from the viewpoint of the platform). The relevant data

of the two users are related, which we capture by assuming that their random variables are jointly

normally distributed with mean zero and correlation coefﬁcient ρ. The platform can acquire or

buy the data of a user in order to better estimate her preferences or actions. Its objective is to

minimize the mean square error of its estimates of user types, or maximize the amount of leaked

information about them. Suppose that the valuation (in monetary terms) of the platform for the

users’ leaked information is one, while the value that the ﬁrst user attaches to her privacy, again

in terms of leaked information about her, is 1/2 and for the second user it is v > 0. We also assume

that the platform makes take-it-or-leave-it offers to the users to purchase their data. In the absence

of any restrictions on data markets or transaction costs, the ﬁrst user will always sell her data

(because her valuation of privacy, 1/2, is less than the value of information to the platform, 1).

But given the correlation between the types of the two users, this implies that the platform will

already have a fairly good estimate of the second user’s information. Suppose, for illustration,

that ρ ≈ 1. In this case, the platform will know almost everything relevant about user 2 from user

1’s data, and this undermines the willingness of user 2 to protect her data. In fact, since user 1 is

revealing almost everything about her, she would be willing to sell her own data for a very low

price (approximately 0 given ρ ≈ 1). But once the second user is selling her data, this also reveals

the ﬁrst user’s data, so the ﬁrst user can only charge a very low price for her data. Therefore in this

simple example, the platform will be able to acquire both users’ data at approximately zero price.

Critically, however, this price does not reﬂect the users’ valuation of privacy. When v ≤ 1, the

equilibrium is efﬁcient because data are socially beneﬁcial in this case (even if data externalities

change the distribution of economic surplus to the advantage of the platform). However, it can be

arbitrarily inefﬁcient when v is sufﬁciently high. This is because the ﬁrst user, by selling her data,

is creating a negative externality on the second user.

This simple example captures many of the economic ideas of the paper. Our formal analysis

generalizes these insights by considering a community of users whose information are correlated

in an arbitrary fashion and who have heterogeneous valuations of privacy. Finally, we analyze

this model both under a monopoly platform and under competition between platforms trying to

simultaneously attract users and acquire their data.

Our main results correspond to generalizations of the insights summarized by the preceding

example. First, we introduce our general framework and characterize the ﬁrst-best allocation

which maximizes the sum of surplus of users and platforms. The ﬁrst best typically involves

considerable data transactions, but those individuals creating signiﬁcant (negative) externalities

on others should not share their data. Second, we establish the existence of an equilibrium and

characterize the prices at which data will be transacted. This characterization clariﬁes how the

market price of data for a user and the distribution of surplus depend on information leaked by

other users. Third and more importantly, we provide conditions under which the equilibrium in

the data market is inefﬁcient as well as conditions for simple restrictions on markets to improve

welfare. At the root of these inefﬁciencies are the economic forces already highlighted by our

example: inefﬁciencies arise when a subset of users are willing to part with their data, which are

informative about other users whose value of privacy is high. We show that these insights extend

to environments with competing platforms and incomplete information as well.

We further investigate various policy approaches to data markets. Person-speciﬁc taxes on

data transactions can restore the ﬁrst best, but are impractical. We show in addition how uniform

taxation on all data transactions might, under some conditions, improve welfare. Finally, we

propose a new regulation scheme where data transactions are mediated in a way that reduces their

correlation with the data of other users, thus minimizing leaked information about others. We

additionally develop a procedure for implementing this scheme based on “de-correlation”, meaning

transforming users’ data so that their correlation with others’ data and types is removed.

Our paper relates to the literature on privacy and its legal and economic aspects. The classic

deﬁnition of privacy, proposed by justices Warren and Brandeis in 1890, is the protection of some-

one’s personal space and the right to be let alone (Warren and Brandeis [1890]). Relatedly, and

more relevant to our focus, Westin [1968] deﬁnes it as the control over and safeguarding of per-

sonal information, and this perspective has been explored from various angles in recent work (e.g.,

Pasquale [2015], Tirole [2019], Zuboff [2019]). More closely related to our paper are MacCarthy

[2010], Fairﬁeld and Engel [2015], Choi et al. [2019], and Bergemann et al. [2019]. MacCarthy

[2010] and Fairﬁeld and Engel [2015] are the ﬁrst contributions we are aware of that emphasize

externalities in data sharing, which play a central role in our analysis. More recently, Choi et al.

[2019] develop a reduced-form model with a related informational externality and a number of re-

sults similar to our excessive information sharing ﬁnding. There are several important differences

between this paper and ours, however. First, Choi et al. [2019] do not model how data sharing

creates an externality and simply assume that consumer welfare depends negatively on the num-

ber of other consumers on an online platform. In contrast, much of our analysis is devoted to the

study of how the correlation structure across different users jointly determines sharing decisions

and the amount of leaked information. Second, they assume that consumers are identical, while

our above example already illustrates that heterogeneity in privacy concerns plays a critical role

in the inefﬁciencies in data markets. Indeed, our analysis highlights that there are only limited

inefﬁciencies when all users are homogeneous (speciﬁcally, the equilibrium is efﬁcient in this case

when they have low or sufﬁciently high value of privacy). Third, their paper does not analyze

the case with competing platforms. Finally, Bergemann et al. [2019] also study an environment

with data externalities. Though there are some parallels between the two papers, their work is

This de-correlation procedure is different from anonymization of data because it does not hide information about

the user sharing her data but about others who are correlated with this user.

different from and largely complementary to ours. In particular, they analyze an economy with

symmetric users where there is a monopolist platform and data are used by this monopolist or

other downstream ﬁrms (such as advertisers) for price discrimination. They focus on the impli-

cations for market prices, proﬁts, and efﬁciency of the structure of the downstream market and

whether data are collected in an anonymized or non-anonymized form.

Our paper also relates to the growing literature on information markets. One branch of this

literature focuses on the use of personal data for improved allocation of online resources (e.g.,

Bergemann and Bonatti [2015], Goldfarb and Tucker [2011], and Montes et al. [2018]). Another

branch investigates how information can be monetized either by dynamic sales or optimal mech-

anisms. For example, Anton and Yao [2002], Babaioff et al. [2012], Es

o and Szentes [2007], Horner

and Skrzypacz [2016], Bergemann et al. [2018], and Eliaz et al. [2019] consider either static or dy-

namic mechanisms for selling data, Ghosh and Roth [2015] uses differential privacy framework

of Dwork and Roth [2014] and study mechanism design with privacy constraints, and Admati

and Pﬂeiderer [1986] and Begenau et al. [2018] study markets for ﬁnancial data. A third branch

focuses on optimal collection and acquisition of information, for example, Agarwal et al. [2018],

Chen and Zheng [2018], and Chen et al. [2018]. Lastly, a number of papers investigate the ques-

tion of whether information harms consumers, either because users are unaware of the data being

collected about them (Taylor [2004]) or because of price discrimination related reasons (Acquisti

and Varian [2005]). See Acquisti et al. [2016], Bergemann and Bonatti [2019], and Agrawal et al.

[2018] for excellent surveys of different aspects of this literature.

The rest of the paper proceeds as follows. Section 2 presents our model, focusing on the case

with a single platform for simplicity. Section 3 provides our main results, in particular, charac-

terizing the structure of equilibria in data markets and highlighting their inefﬁciency due to data

externalities. It also shows how shutting down data markets may improve welfare. Section 4

extends these results to a setting with competing platforms, while Section 5 presents analogous

results when the value of privacy of each user is their private information. Section 6 shows that

two types of policies, taxes and third-party-mediated information sharing, can improve welfare.

Section 7 concludes, while Appendix A presents the proofs of some of the results stated in the text

and the online Appendix B contains the remaining proofs and additional results.

2 Model

In this section we introduce our model, focusing ﬁrst on the case with a single platform. Compe-

tition between platforms is analyzed in Section 4.

2.1 Information and Payoffs

We consider n users represented by the set V = {1, . . . , n}. Each user i ∈ V has a type denoted by

which is a realization of a random variable X

. We assume that the vector of random variables

X = (X

, . . . , X

) has a joint normal distribution N (0, Σ), where Σ ∈ R

n×n

is the covariance

matrix of X. Let Σ

designate the (i, j)-th entry of Σ and Σ

= σ

> 0 denote the variance of

individual i’s type.

Each user has some personal data, S

, which is informative about her type (for example, based

on her past behavior, preferences, or contacts). We suppose that S

= X

+ Z

where Z

is an

independent random variable with standard normal distribution, i.e., Z

∼ N (0, 1).

For any user joining the platform, the platform can derive additional revenue if it can predict

her type. This might be because of improved personalized services, targeted advertising, or price

discrimination for some services sold on the platform. Since the exact source of revenue for the

platform is immaterial for our analysis, we simply assume that the platform’s revenue from each

user is a(n inverse) function of the mean square error of its forecast of the user’s type, minus what

the platform pays to users to acquire their information. Namely, the objective of the platform is to

minimize

i∈V



(ˆx

(S) − X

)

− σ

+ p



, (1)

where S is the vector of data the platform acquires, ˆx

(S) is the platform’s estimate of the user’s

type given this information, −σ

is included as a convenient normalization, and p

denotes pay-

ments to user i from the platform (we ignore for simplicity any other transaction costs incurred

by the platform and discuss taxes and regulations in Section 6). This payment to the user may be

a direct one in an explicit data market or it may be an implicit transfer, for example, in the form of

some good or service the platform provides to the user in exchange for her data.

Users value their privacy, which we also model in a reduced-form manner as a function of the

same mean square error.

This reﬂects both pecuniary and nonpecuniary motives, for example,

the fact that a user may receive a greater consumer surplus when the platform knows less about

her or she may have a genuine demand for keeping her preferences, behavior, and information

private. There may also be political and social reasons for privacy, for example, for concealing

dissident activities or behaviors disapproved by some groups. We assume, speciﬁcally, that user

i’s value of privacy is v

≥ 0, and her payoff is



(ˆx

(S) − X

)

− σ



+ p

This expression and its comparison with (1) clariﬁes that the platform and users have potentially-

opposing preferences over information about user type. We have again subtracted σ

as a normal-

ization, which ensures that if the platform acquires no additional information about the user and

makes no payment to her, her payoff is zero.

Critically, users with v

< 1 value their privacy less than the valuation that the platform at-

taches to information about them, and thus reducing the mean square error of the estimates of

their types is socially beneﬁcial. In contrast, users with v

> 1 value their privacy more, and

In this and the next section, we do not model the decision of whether to join the platform. Joining decisions are

introduced in Section 4, where we assume that users receive additional services (unrelated to their personal data) from

platforms encouraging them to join even in the presence of loss of privacy.

reducing their mean square error is socially costly. In a world without data externalities (where

data about one user have no relevance to the information about other users), the ﬁrst group of

users should allow the platform to acquire (buy) their data, while the second group should not. A

simple market mechanism based on prices for data can implement this efﬁcient outcome.

We will see that the situation is very different in the presence of data externalities.

2.2 Leaked Information

A key notion for our analysis is leaked information, which captures the reduction in the mean square

error of the platform’s estimate of the type of a user. When the platform has no information about

user i, its estimate satisﬁes E

(ˆx

− X

)

= σ

. As the platform receives data from this and other

users, its estimate improves and the mean square error declines. The notion of leaked information

captures this reduction in mean square error.

Speciﬁcally, let a

∈ {0, 1} denote the data sharing action of user i ∈ V with a

= 1 correspond-

ing to sharing. Denote the proﬁle of sharing decisions by a = (a

, . . . , a

) and the decisions of

agents other than i by a

−i

. We also use the notation S

to denote the data of all individuals for

whom a

= 1, i.e., S

= (S

: j ∈ V s.t. a

= 1). Given a proﬁle of actions a, the leaked information

of (or about) user i ∈ V is the reduction in the mean square error of the best estimator of the type

of user i:

(a) = σ

− min

ˆx

− ˆx

))

Notably, because of data externalities, leaked information about user i depends not just on her

decisions but also on the sharing actions taken by all users.

With this notion at hand, we can write the payoff of user i given the price vector p = (p

, . . . , p

)

, a

−i

, p) =











− v

= 1, a

−i

) , a

= 1

−v

= 0, a

−i

) , a

= 0,

where recall that v

≥ 0 is user’s value of privacy.

We also express the platform’s payoff more compactly as

U(a, p) =

i∈V

(a) −

i∈V: a

. (2)

2.3 Equilibrium Concept

An action proﬁle a = (a

, . . . , a

) and a price vector p = (p

, . . . , p

) constitute a pure strategy

equilibrium if both users and the platform maximize their payoffs given other players’ strategies.

More formally, in the next deﬁnition we deﬁne an equilibrium as a Stackelberg equilibrium in which

the platform chooses the price vector recognizing the user equilibrium that will result following this

choice.

Deﬁnition 1. Given the price vector p = (p

, . . . , p

), an action proﬁle a = (a

, . . . , a

) is user

equilibrium if for all i ∈ V,

∈ argmax

a∈{0,1}

= a, a

−i

, p).

We denote the set of user equilibria at a given price vector p by A(p). A pair (p

, a

) of price and

action vectors is a pure strategy Stackelberg equilibrium if a

∈ A(p

) and there is no proﬁtable

deviation for the platform, i.e.,

U(a

, p

) ≥ U(a, p), for all p and for all a ∈ A(p).

In what follows, we refer to a pure strategy Stackelberg equilibrium simply as an equilibrium.

3 Analysis

In this section, we ﬁrst study the ﬁrst-best information sharing decisions that maximize the sum of

users and platform payoffs and then proceed to characterizing the equilibrium and its efﬁciency

properties.

3.1 First Best

We deﬁne the ﬁrst best as the data sharing decisions that maximize utilitarian social welfare or

social surplus given by the sum of the payoffs of the platform and users. Social surplus from an

action proﬁle a is

Social surplus(a) = U(a, p) +

i∈V

(a, p) =

i∈V

(1 − v

(a).

Prices do not appear in this expression because they are transfers from the platform to users.

The

ﬁrst-best action proﬁle, a

, maximizes this expression. The next proposition characterizes the

ﬁrst-best action proﬁle.

Proposition 1. The ﬁrst best involves a

= 1 if

j∈V

(1 − v

)



Cov



, X

| a

= 0, a

−i



1 + σ

− I

= 0, a

−i

)

≥ 0, (3)

and a

= 0 if the left-hand side of (3) is negative.

The proof of this proposition as well as all other proofs, unless otherwise stated, are presented

in Appendix A.

In including the platform’s payoff in social surplus we are assuming that this payoff is not coming from shifting

revenues from some other (perhaps off-line) businesses. If we do not include the payoff of the platform in our welfare

measure, our inefﬁciency results would hold a fortiori.

To understand this result, consider ﬁrst the case in which there are no data externalities so that

the covariance terms in (3) are zero, except Cov



, X

| a

= 0, a

−i



= σ

, so that the left-hand

side is simply σ

/(1 + σ

) times 1 − v

. This yields a

= 1 if v

≤ 1. The situation is different in

the presence of data externalities, because now the covariance terms are non-zero. In this case, an

individual should optimally share her data only if it does not reveal too much about users with

> 1.

3.2 Equilibrium Preliminaries

The next lemma characterizes two important properties of the leaked information function I

{0, 1}

→ R.

Lemma 1. 1. Monotonicity: for two action proﬁles a and a

with a ≥ a

(a) ≥ I

), ∀i ∈ {1, . . . , n}.

2. Submodularity: for two action proﬁles a and a

with a

−i

≥ a

−i

= 1, a

−i

) − I

= 0, a

−i

) ≥ I

= 1, a

−i

) − I

= 0, a

−i

The monotonicity property states that as the set of users who share their information expands,

the leaked information about each user (weakly) increases. This is an intuitive consequence of

the fact that more information always facilitates the estimation problem of the platform and re-

duces the mean square error of its estimates. More important for the rest of our analysis is the

submodularity property, which implies that the marginal increase in the leaked information from

individual i’s sharing decision is decreasing in the information shared by others. This too is intu-

itive and follows from the fact that when others’ actions reveal more information, there is less to

be revealed by the sharing decision of any given individual.

Using Lemma 1 we next show that for any price vector p ∈ R

, the set A(p) is a (non-empty)

complete lattice.

Lemma 2. For any p, the set A(p) is a complete lattice, and thus has a least and a greatest element.

Lemma 2 implies that the set of user equilibria is always non-empty, but may not be singleton

as we illustrate in the next example.

Lemma 1 is established under the assumption of Gaussian signals and with mean square error as the measure of

leaked information the players care about. However, this lemma and our equilibrium existence and inefﬁciency results

that build on it hold for any structure of signals and any measure of leaked information that ensure the monotonicity

and submodularity properties of Lemma 1. One important example that satisﬁes these properties is a leaked infor-

mation measure given by the mutual information between a user’s type and the vector of types of users who have

shared their data: I

(a) = I(X

; (X

: a

= 1)) (where the mutual information between two random variables X

and Y is deﬁned as I (X ; Y ) = E

X,Y

[− log

P (X )P (Y )

P (X,Y )

]). This measure of leaked information satisﬁes monotonicity and

submodularity for any distribution of random variables X (see Appendix B for more details).

(2−ρ

)

2(4−ρ

)

(2−ρ

)

2(4−ρ

)

(2−ρ

)

2(4−ρ

)

(2−ρ

)

2(4−ρ

)

=1a

=0a

=1a

=0a

=1a

Figure 1: The user equilibrium as a function of price vector (p

, p

) in the setting of Example 1. For

the prices in the purple area in the center, both a

= a

= 0 and a

= a

= 1 are user equilibria.

Example 1. Suppose there are two users 1 and 2 with covariance matrix

Σ =

1 ρ

ρ 1

and v

= v

= v. The set of user equilibria in this case is depicted in Figure 1. When p

, p

∈

(2−ρ

)

2(4−ρ

)

, both action proﬁles a

= a

= 0 and a

= a

= 1 are user equilibria. This is a

consequence of the submodularity of the leaked information function (Lemma 1): when user 1

shares her data, she is also revealing a lot about user 2, and making it less costly for her to share

her data. Conversely, when user 1 does not share, this encourages user 2 not to share. Despite this

multiplicity of user equilibria, there exists a unique (Stackelberg) equilibrium for this game given

by a

= a

= 1 and p

= p

(2−ρ

)

2(4−ρ

)

. This uniqueness follows because the platform can choose

the price vector to encourage both users to share.

3.3 Existence of Equilibrium

The next theorem establishes the existence of a (pure strategy) equilibrium.

Theorem 1. An equilibrium always exists. That is, there exist an action proﬁle a

and a price vector p

such that a

∈ A(p

), and

U(a

, p

) ≥ U(a, p), for all p and for all a ∈ A(p). (4)

Note that the equilibrium may not be unique, but if there are multiple equilibria, all of them

yield the same payoff for the platform (since otherwise (4) would not be satisﬁed for the equilib-

rium with lower payoff for the platform). The following example clariﬁes this point.

Example 2. Suppose there are three users with the same value of privacy and variance: v

= 1.18

and σ

= 1 for i = 1, 2, 3. We let all off-diagonal entries of Σ to be 0.3. Any action proﬁle where

two out of three users share their information is an equilibrium, and thus there are three distinct

equilibria. But it is straightforward to verify that they all yield the same payoff to the platform.

3.4 An Illustrative Example

In this subsection, we provide an illustrative example that shows how some of the key objects in

our analysis are computed and highlights a few of the subtle aspects of the equilibrium. Consider

the same setting as in Example 1 with two users with the same value of privacy, v, and a correlation

coefﬁcient ρ between their information. Given the action proﬁle of users, the joint distribution of

, S

) in this example is













∼ N

























1 1 ρ

1 1 + γ

ρ ρ 1 + γ













where







1 a

= 1,

∞ a

= 0.

Suppose the platform has received the signals (S

, S

), then its optimal estimator for X

, ˆx

, S

is simply the conditional expectation of X

given s

and s

, and its mean square error is equal to

this estimator’s variance, i.e.,

min

ˆx

(ˆx

, S

) − X

)

(1 + γ

− ρ

)

(1 + γ

)(1 + γ

) − ρ

Similarly, the mean square error of the platform’s best estimator for X

min

ˆx

(ˆx

, S

) − X

)

(1 + γ

− ρ

)

(1 + γ

)(1 + γ

) − ρ

We ﬁrst show that the total payment from the platform to users is non-monotone in the number

of users sharing their information. When the platform induces both users to share (a

= a

= 1),

it makes a total payment of v

(2−ρ

)

4−ρ

. In contrast, when it only induces the ﬁrst user to share

= 1, a

= 0), this will cost

. Therefore, when ρ

≥

7−

√

≈ 0.71, the platform pays less to

have both users share their data. Intuitively, this cost-saving for the platform is a consequence of

the submodularity of leaked information (Lemma 1): when both users share, the data of each are

less valuable in view of the information revealed by the other user. This ﬁnding reﬂects one of the

claims made in the Introduction: market prices for data do not reﬂect the value that users attached

to their privacy and may be depressed because of data externalities.

We next illustrate that equilibrium (social) surplus is non-monotonic in the users’ value of

(2−ρ

)

(2−ρ

)

= a

=1a

= a

=1 a

= a

=0a

= a

Social surplusSocial surplus

4−ρ

Figure 2: Equilibrium and social surplus as a function of the value of privacy v for a setting with

two users with σ

= σ

= 1, Σ

= ρ, and v

= v

= v.

privacy. Equilibrium surplus is depicted in Figure 2. For values of v larger than

(2−ρ

)

, users

do not share their data and equilibrium surplus is zero. When v is smaller than 1, users share

their data and equilibrium surplus is positive. For intermediate values of v, in particular for

v ∈ [1,

(2−ρ

)

], the platform chooses a price vector that induces both users to share their data, but

in this case, the social surplus is negative. The intuition is related to the point already emphasized

in the previous paragraph: when both users share their data, the externalities depress the market

prices for data and this makes it proﬁtable for the platform to acquire the users’ data even though

v > 1. More explicitly, when user 2 shares her data, this reveals sufﬁcient information about user

1 that she becomes willing to accept a relatively low price for sharing her data, and this maintains

an equilibrium with low prices for data even though both users attach a relatively high value to

their privacy.

3.5 Equilibrium Prices

In this subsection, we characterize the equilibrium price vector. For any action proﬁle a ∈ {0, 1}

let p

denote the least (element-wise minimum) equilibrium price vector that sustains an action

proﬁle a in a user equilibrium. More speciﬁcally, p

is deﬁned such that:

≤ p, for all p such that a ∈ A(p).

Proﬁt maximization by the platform implies that equilibrium prices must satisfy this property —

since otherwise the platform could reduce prices and still implement the same action proﬁle. We

therefore refer to p

as “equilibrium price vector” or simply as “equilibrium prices” (with the

Prices for users not sharing their data are not well-deﬁned.

understanding that these would be the equilibrium prices when the platform chooses to induce

action proﬁle a).

The next theorem computes this price vector (and shows that it exists).

Theorem 2. For any action proﬁle a ∈ {0, 1}

, we have

= 1, a

−i

) = I

= 0, a

−i

) +



− I

= 0, a

−i

)



(σ

+ 1) − I

= 0, a

−i

)

, (5)

and

= 0, a

−i

) = d

(I + D

)

−1

, for all a

= 1,

where D

is the matrix obtained by removing row and column i from matrix Σ as well as all rows and

columns j for which a

= 0, and d

is (Σ

: j s.t. a

= 1). The equilibrium price that sustains action

proﬁle a is







(

−I

=0,a

−i

)

(σ

+1)−I

=0,a

−i

)

, a

= 1,

0, a

= 0.

The ﬁrst part of Theorem 2 provides a decomposition of leaked information about user i in

terms of leaked information about her when she does not share her data. In particular, the ﬁrst

term on the right-hand side of the equation (5), I

= 0, a

−i

), is her leaked information resulting

from the data sharing of other users and thus represents the data externality. The second term

is the additional leakage when user i shares her data. The second part of Theorem 2 states that

the equilibrium price offered to any user i who shares her information must make her indifferent

between sharing and not sharing. This is because prices are determined by the platform’s offers.

This is also why the equilibrium price, p

, is equal to the value of privacy, v

, multiplied by the

second term in (5), which is the additional leakage of information and hence the loss of privacy

resulting from the user’s own data sharing.

The following is an immediate corollary of Theorem 2.

Corollary 1. For any user i, the equilibrium price p

=1,a

−i

)

(that induces a

= 1 for any action proﬁle

−i

∈ {0, 1}

n−1

) is increasing in σ

and decreasing in the data externality captured by I

= 0, a

−i

Moreover, leaked information I

= 1, a

−i

) is increasing in σ

and in the data externality I

= 0, a

−i

The ﬁrst part of Corollary 1 shows that a higher variance of user’s type, σ

, increases the

equilibrium price. Intuitively, a higher variance makes the user’s type more difﬁcult to predict

and thus her own information more valuable. This also explains why the price is decreasing in the

data externality — represented by information leaked by others, I

= 0, a

−i

). The last part of

Corollary 1 shows that a higher variance of individual type, as well as a greater data externality,

increase the overall leakage of information about the user.

The next proposition establishes that greater correlation between users’ data, and thus greater

data externality, reduces equilibrium prices.

Proposition 2. For any action proﬁle a ∈ {0, 1}

and any i ∈ V, the price p

is nonincreasing in the

absolute value of the covariance between any pair of users given the action proﬁle a.

The equilibrium price for i is the difference between the information that platform has about

i with and without i’s own data (multiplied by her value of privacy v

). Suppose ﬁrst that the

correlation between i’s data and some other user j’s data increases. Mathematically, this means

that |Cov(X

, X

| a)| increases, and user j’s information about i becomes more accurate. There-

fore, provided that a

= 1, this reduces the value of user i’s information for predicting her type

and thus depresses her beneﬁts from protecting her data. The same result also applies when the

correlation between two other users increases, that is, when |Cov(X

, X

| a)| increases. In this

case, the platform obtains more accurate information about users j and k, and indirectly their data

become more informative about user i, leading to the same conclusion.

Proposition 2 establishes that equilibrium prices are nonincreasing in the correlation between

users’ data. Yet the number of users who share their data in equilibrium is non-monotone in this

correlation as we show in the next example.

Example 3. Consider a setting with three users with Σ

= Σ

= 1/2, Σ

= ρ, v

= 3/2, and

= 1 for all users. We have the following cases.

1. ρ = 0: the equilibrium decisions are a

= 1, and a

= a

= 0, the total payment is 0.75, and

the total information leakage is

i=1

) = 0.75.

2. ρ = 1/2: the equilibrium decisions are a

= a

= 1, the total payment is 1.6, and the

total information leakage is

i=1

) = 1.66.

3. ρ = 1: the equilibrium decisions are a

= 0 and a

= a

= 1, the total payment is 0.5, and

the total information leakage is

i=1

) = 1.5.

Therefore, as we increase the correlation among users, the set of users that share information in

equilibrium may increase (from case 1 to case 2) or decrease (from case 2 to case 3). The intuition

is that as more information is leaked about a user (in this case about user 1), the value of her data

both to the platform and to herself declines, and the platform may no longer ﬁnd it worthwhile to

compensate her for selling her data.

The next proposition, however, shows that equilibrium prices are decreasing in the set of users

sharing their data.

Proposition 3. For two action proﬁles a, a

with a

≥ a, we have p

≤ p

for all i ∈ V for which a

= 1.

Proposition 3 follows from Theorem 2 and Lemma 1. In particular, using Theorem 2 the equi-

librium price for user i is her additional loss of privacy (increase in the information leakage multi-

plied by v

) if she shares her data. From the submodularity of information leakage (Lemma 1) the

additional information the user leaks about herself decreases when more people share their data.

The proposition covers the prices for users sharing their data, since prices for those not sharing are not well-deﬁned.

3.6 Inefﬁciency

This subsection presents one of our main results, documenting the extent of inefﬁciency in data

markets.

First note that all users with value of privacy less than 1 will always share their data in equi-

librium. For future reference, we state this straightforward result as a lemma.

Lemma 3. All users with value of privacy v

≤ 1 share their data in equilibrium.

Motivated by this lemma, we partition users into two sets, those with value of privacy below

1 (“low-value users”) and those above (“high-value users”):

(l)

= {i ∈ V : v

≤ 1} and V

(h)

= {i ∈ V : v

> 1}.

We also denote by v

(h)

and v

(l)

the vectors of valuations of privacy for high-value and low-value

users, respectively. Lemma 3 then implies that for all i ∈ V

(l)

we have a

= 1.

The next theorem provides conditions for efﬁciency and inefﬁciency. More precisely, we show

that if every high-value user is uncorrelated with all other users, then equilibrium is efﬁcient.

Otherwise, if either a high-value user is correlated with a low-value user or two high-value users

are correlated, there exists a set of valuations (consistent with the set of high and low-value users)

such that any equilibrium is inefﬁcient.

Theorem 3. 1. Suppose every high-value user is uncorrelated with all other users. Then the equilib-

rium is efﬁcient.

2. Suppose at least one high-value user is correlated (has a non-zero correlation coefﬁcient) with a low-

value user. Then there exists

v ∈ R

(h)

such that for v

(h)

≥

v the equilibrium is inefﬁcient.

3. Suppose every high-value user is uncorrelated with all low-value users and at least one high-value

user is correlated with another high-value user. Let

(h)

⊆ V

(h)

be the subset of high-value users

correlated with at least one other high-value user. Then for each i ∈

(h)

there exists ¯v

> 0 such that

if for any i ∈

(h)

< ¯v

, the equilibrium is inefﬁcient

Theorem 3 clariﬁes the source of inefﬁciency in our model. If high-value users are not cor-

related with others, the equilibrium is efﬁcient. In this case, there may still be data externalities

among low-value users and these may affect market prices (and the distribution of economic gains

between the users and the platform). But they do not create a loss of privacy for users who prefer

not to share their data.

The only subtlety here is about users with v

= 1. If these users’ information is correlated with others who are

already sharing, their equilibrium price will be strictly less than 1, and this will make it strictly beneﬁcial for the

platform to purchase their data. If they are correlated with others who are not sharing, then the platform would still

like to purchase these data because of the additional reduction in the mean square error of its estimates of others’

types they enable. When such an individual is uncorrelated with anybody else, then the platform would be indifferent

between purchasing her data and not. In this case, for simplicity of notation, we suppose that it still purchases.

However, the second part of the theorem shows that if high-value users are correlated with

low-value users, the equilibrium is typically inefﬁcient. The additional condition v

(h)

≥

v is not

a restrictive one as highlighted in Example 4 below and rules out cases in which high-value users

suffer only little loss of privacy but generate socially valuable information about low-value users.

In general, the inefﬁciency identiﬁed in this part of the theorem can take one of two forms: either

high-value users do not share their data, but because of information leaked about them, they suffer

a loss of privacy. Or given the amount of leaked information about them, high-value users decide

to share themselves (despite their initial reluctance to do so).

Finally, the third part of the theorem covers the remaining case, where high-value users are

uncorrelated with low-value users but are correlated among themselves. The equilibrium is again

inefﬁcient, because the platform can induce some of them to share their data (even though indi-

vidually each would prefer not to). This is because when a subset of them share, this compromises

the privacy of others, depresses data prices, and may incentivize them to share too (in turn further

depressing data prices). This inefﬁciency applies when some high-value users have intermediate

values of privacy (i.e., v

∈ (1, ¯v

)), since those with sufﬁciently high value of privacy cannot be

induced to share their data.

Overall, this theorem highlights that inefﬁciency in data markets originates from the combina-

tion of sufﬁciently high value attached to privacy by some users and their correlation with other

users. It therefore emphasizes that inefﬁciency in our model is tightly linked to data externalities.

3.7 Are Data Markets Beneﬁcial?

Theorem 3 focuses on the comparison of the market equilibrium to the ﬁrst best. This is a tough

comparison for the market because in the ﬁrst best some users share their data and beneﬁt from

market transactions, while others do not share. A lower bar for data markets is whether they

achieve positive social surplus so that any inefﬁciencies they create are (partially) compensated

by beneﬁts for other agents. We next show that this is not necessarily the case and provide a

sufﬁcient condition for the equilibrium (social) surplus to be negative — so that shutting down

data markets all together would improve social surplus (and thus utilitarian welfare). Let us also

introduce the following notation: for any action proﬁle a ∈ {0, 1}

, we let I

(T ) denote the leaked

information about user i where T = {i ∈ V : a

= 1}.

Proposition 4. We have

Social surplus(a

) ≤

i∈V

(l)

(1 − v

(V) −

i∈V

(h)

− 1)I

(l)

This implies that if

i∈V

(h)

− 1)I

(l)

) >

i∈V

(l)

(1 − v

(V), (6)

then the equilibrium surplus is negative and utilitarian welfare improves if data markets are shut down.

This proposition follows immediately from Lemma 3. The ﬁrst term is an upper bound on

the gain in social surplus from the sharing decisions of low-value users (even if these gains do

not necessarily accrue to the users themselves and are mainly captured by the platform). This

expression is an upper bound because we are evaluating this term under the assumption that all

users share their data, thus maximizing the amount of socially beneﬁcial information about low-

value users. The second term is a lower bound on the loss of privacy from high-value users. It is a

lower bound because the loss of privacy is evaluated for the minimal set of agents, the low-value

ones, who always share their data (in equilibrium a superset of V

(l)

will share their data).

We also add that leaked information in this proposition is only a function of the matrix Σ as

shown in Theorem 2, so the right-hand side is in terms of model parameters and does not depend

on equilibrium objects.

The next proposition provides a sufﬁcient condition in terms of values of privacy and corre-

lations between data that ensures condition (6) and implies that the equilibrium necessarily has

negative social surplus.

Proposition 5. Suppose

i∈V

(h)

− 1)

j∈V

(l)

||Σ

(l)

+ 1

i∈V

(l)

(1 − v

where ||Σ

(l)

denotes the 1-norm of the submatrix of Σ which only includes the rows and columns corre-

sponding to low-value users. Then the equilibrium surplus is negative.

Proposition 5 provides explicit sufﬁcient conditions in terms of the values of privacy and the

correlation between high and low-value users for negative equilibrium surplus.

Example 4. We consider a setting with two communities, each of size 10. Suppose that all users in

community 1 are low-value and have a value of privacy equal to 0.9, while all users in community

2 are high-value (with v

> 1). We also take the variances of all user data to be 1, the correlation

between any two users who belong to the same community to be 1/20, and the correlation be-

tween any two users who belong to different communities to be ρ. Figure 3 depicts equilibrium

surplus as a function of v

and ρ. The curve in the ﬁgure represents the combinations of these two

variables for which the social surplus is equal to zero. Moving in the northeast direction reduces

equilibrium surplus and hence the shaded area has negative surplus. Consequently, in this part of

the parameter space, shutting down data markets improves utilitarian social welfare.

Two points are worth noting. First, relatively small values of the correlation coefﬁcient ρ are

sufﬁcient for social surplus to be negative. Second, when v

is very close to 1, the social surplus is

always positive because the negative surplus from high-value users is compensated by the social

beneﬁts their data sharing creates for low-value users.

1-norm of a matrix A is deﬁned as ||A||

= max

j=1

Value of privacy for high-value users: v

12345678910

Cross-community correlation: ρ

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Shaded area shows the pairs of (ρ, v

) with negative equilibrium surplus in the setting

of Example 4.

4 Competition Among Platforms

In this section we generalize the main results from the previous section to a setting in which

multiple platforms compete for (the data of) users. For simplicity we focus on the case in which

there are two platforms. Formally, we are dealing with a three-stage game in which ﬁrst users

decide which platform to join (if any), then platforms simultaneously offer prices for data, and

then ﬁnally all users simultaneously decide whether to share their data. We start by assuming

that data externalities are only among users joining the same platform, but then generalize this to

the case in which the platform learns valuable information about users that are not on its platform

as well. At the end of the section, we also discuss the case in which platforms ﬁrst set prices for

data in order to attract users.

4.1 Information and Payoffs

For any i ∈ V, we denote by b

∈ {0, 1, 2} the joining decision of user i in the ﬁrst-stage game

where b

= 0 means user i does not join, b

= 1 means she joins platform 1, and b

= 2 stands for

joining platform 2. Let us also deﬁne

= {i ∈ V : b

= 1} and J

= {i ∈ V : b

= 2},

as the sets of users joining the two platforms.

Similar to the monopoly case in the previous section, the payoff of a platform is a function of

leaked information about users and payments to users. So for platform k ∈ {1, 2}, we have

(k)

, a

, p

) =

i∈J

) −

i∈J

: a

, (7)

where a

∈ {0, 1}

denotes the sharing decision of users belonging to this platform, and p

denotes the vector of prices the platform offers to users in J

The payoff of a user has three parts. First, each user receives a valuable service from the

platform it joins. Since we are modeling joining decisions in this section, we will be more explicit

about this “joining value” and assume that it depends on who else joins the platform. We therefore

write this part of the payoff as c

) for user i joining platform b

, with the convention that J

= ∅,

and also normalize c

(J) = 0 for all J 63 i and for all i ∈ V. Second, the user suffers a disutility due

to loss of privacy from leaked information as before, and we again denote the value of privacy

for user i by v

. Third, she receives beneﬁts from any payments from the platform in return of the

data she shares. Thus the payoff to user i joining platform k ∈ {1, 2} can be written as

, a

−i

, p

) =











− v



= 1, a

−i



+ c

), a

= 1,

−v



= 0, a

−i



+ c

) a

= 0.

With our convention, when the user chooses b

= 0, this payoff is equal to zero — there is no data

sharing decision, the payment is equal to zero, leaked information is equal to zero, and c

(∅) = 0.

The timing of events is as follows.

1. Users simultaneously decide which platform, if any, to join, i.e., b ={b

}

i∈V

, which deter-

mines J

and J

2. Given J

and J

, the two platforms simultaneously offer price vectors p

and p

3. Given J

and J

and p

, users simultaneously make their data sharing decisions,

a : {0, 1, 2}

× R

→ {0, 1}.

The assumption that users join platforms before prices is adopted both for simplicity and to

capture the fact that there is currently a limited ability for platforms to attract customers by of-

fering prices for data. We discuss this type of price competition and its implications below. Even

though prices are offered at the second stage, users will anticipate these prices and the implica-

tions they have for leaked information in making their joining decisions.

4.2 Equilibrium Concept

We ﬁrst observe that from the second stage on, once the set of users joining a platform is deter-

mined, this game is identical to the one we analyzed in the previous section. Hence, the Stackel-

berg equilibrium from the second stage onwards is deﬁned analogously. Then in the ﬁrst stage,

we deﬁne a joining equilibrium, as a proﬁle of joining decisions anticipating the equilibrium from

the second stage onward. More formally:

Deﬁnition 2. Given a joining decision b and the corresponding sets of users on the two platforms,

and J

, a pure strategy Stackelberg equilibrium from the second stage onwards is given by price

vectors p

and p

and action proﬁles (a

, a

) such that a

∈ A(p

) and

(k)

, a

, p

) ≥ U

(k)

, a

, p

), for all p

and for all a

∈ A(p

)

for k ∈ {1, 2}.

Joining decision proﬁle b

and the corresponding sets of users on the two platforms, J

and

, constitute a pure strategy joining equilibrium if no user has a proﬁtable deviation. That is, for

all i ∈ V,

, a

, p

) ≥ u

∪ {i}, a

∪{i},E

, p

∪{i},E

) for k 6= b

and

, a

, p

) ≥ 0, for all i ∈ J

Note that the ﬁrst condition for the joining equilibrium ensures that each user prefers the plat-

form she joins to the other platform, and given our convention that J

= ∅, it also implies that

users not joining either platform prefer this to joining one of the two platforms. The second condi-

tion makes sure that a user joining a platform receives nonnegative payoff, since not joining either

platform guarantees zero payoff.

4.3 Equilibrium Existence and Characterization

Since our focus is on situations in which users join online platforms and share their data, we

impose that joining values are sufﬁciently large.

Assumption 1. For each i ∈ V, we have

1. for all J and J

such that i ∈ J and J ⊂ J

, we have c

) > c

(J).

2. c

({i}) > max

i∈V

This assumption implies that users receive greater services from a platform when there are

more users on the platform, which captures the network effects in online services and social media.

The fact that this beneﬁt is indexed by i means that users can prefer being on the same platform

with different sets of other users. This assumption also imposes that even when there are no other

users on a platform, the value of the services provided by the platform exceed the cost of loss

of privacy. This second aspect directly yields the next lemma, which simpliﬁes the rest of our

analysis. For the rest of this section, we impose Assumption 1 without explicitly stating it.

Lemma 4. Each user joins one of the two platforms. In other words, b

= 1 or 2 for all i ∈ V.

The next theorem is the direct analogue of Theorems 1 and 2 in the previous section and char-

acterizes the Stackelberg equilibrium given joining decisions.

Theorem 4. Consider a joining proﬁle b with the corresponding sets of users J

and J

. Then a pure

strategy Stackelberg equilibrium exists and satisﬁes

(k)

, a

, p

) ≥ U

(k)

, a

, p

), for all p

, a

∈ A(p

), and k = 1, 2.

Moreover, for any i ∈ J

the equilibrium prices are

= v



) − I

= 0, a

−i

)



. (8)

and

, a

, p

) = −v

= 0, a

−i

) + c

). (9)

Although a pure strategy Stackelberg equilibrium exists, a pure strategy joining equilibrium

may not. This is because of data externalities, which make users sometimes go to the platform

where less of their information will be leaked.

The next example illustrates this.

Example 5. Suppose that there are three users with

Σ =







4 .05 .06

.05 4 .05

.06 .05 .3







and values of privacy are given by

= 0.01, v

= 0.99, v

= 1.1.

Also for simplicity, in this example we take c

to be a constant function c for all i (meaning that

conditional on joining a platform, all users receive the same beneﬁt). In this setting, user 3 has

the highest value for privacy, but shares her data if she is on the same platform as user 2 (this is

because her data are highly correlated with user 2’s). However, she does not share her data if she

is on the same platform as user 1.

We next list the possible pure strategy joining proﬁles and show that none of them can be an

equilibrium.

1. J

= {1, 2}, J

= {3}: with this joining proﬁle, the resulting Stackelberg equilibrium involves

platform 1 buying the data of both users 1 and 2, while platform 2 does not buy user 3’s data.

In particular, from equation (8), the price of user 1’s data can be expressed as v

1, a

= 1) − I

= 0, a

= 1)), which yields this user a payoff of −v

= 0, a

= 1) + c

in this candidate equilibrium. This implies that user 1 has a proﬁtable deviation, which is to

switch to platform 2. To see that this deviation is proﬁtable, note that after this deviation, we

Note that even though increasing the amount of leaked information about low-value users increases social surplus,

it reduces these users’ payoff because it enables their platform to pay less for their data. This is the reason low-value

users may prefer to have less information leaked about themselves and choose a platform with fewer other low-value

users.

have J

= {2} and J

= {1, 3}, and the resulting Stackelberg equilibrium involves platform

1 buying user 2’s data, while platform 2 buys user 1’s data but not user 3’s data. This gives

user 1 a payoff of c, verifying that the deviation is beneﬁcial for her.

2. J

= {2, 3}, J

= {1}: with this joining proﬁle, the resulting Stackelberg equilibrium involves

platform 1 buying the data of both users 2 and 3, while platform 2 buys user 1’s data. With

a similar reasoning, user 2’s payoff is −v

= 0, a

= 1) + c. But in this case user 2

has a proﬁtable deviation. By switching to platform 2, equation (8) implies that she will

be offered a price of v

= 1, a

= 1) − I

= 0, a

= 1)) and receive a payoff of

−v

= 0, a

= 1) + c, which exceeds her candidate equilibrium payoff.

3. J

= {1, 3}, J

= {2}: in this case, with a similar reasoning, user 3 has a proﬁtable deviation.

In the candidate equilibrium, this user receives a payoff of −v

= 1, a

= 0) + c, and

if she deviates and switches to platform 2, she receives the greater payoff of −v

1, a

= 0) + c.

4. J

= {1, 2, 3}, J

= ∅: in this case, user 3 again has a proﬁtable deviation and can increase

her payoff from −v

= 1, a

= 0) + c to c by switching to platform 2. This

establishes that there is no pure strategy equilibrium in this game.

Even though a pure strategy equilibrium may fail to exist, we next show that a mixed strategy

joining equilibrium always exists.

Deﬁnition 3. For any user i ∈ V, let B

be the set of probability measures over {1, 2} and β

∈ B

be a mixed strategy. We also let β ∈

i∈V

be a mixed strategy proﬁle. Then β

is a mixed

strategy joining equilibrium if

(β

, β

−i

, a

, p

) ≥ u

(β

, β

−i

, a

, p

), for all i ∈ V and β

∈ B

where u

(β

, β

−i

, a

, p

) = E

∼β

−i

∼q

−i



, b

−i

, a

, p

)



Theorem 5. There always exists a mixed strategy joining equilibrium in which all users join each platform

with probability 1/2.

Theorem 5 follows since when all other users are choosing one of the two platforms uniformly

at random (each with probability 1/2), each user is indifferent between the two platforms and can

thus randomize with probability 1/2 herself.

We also note that when the beneﬁt from joining a more crowded platform is sufﬁciently greater

than the beneﬁt from joining a smaller platform, this may restore the existence of a pure strategy

equilibrium.

In Appendix B, we also prove that if all users are low-value, then the setup with competing platforms is a potential

game and thus a pure strategy equilibrium exists. Moreover, we show that the social surplus of this equilibrium is

always less than the social surplus under monopoly, because competition between the platforms leads to an inefﬁciently

fragmented distribution of low-value users as they try to avoid information about them being leaked by other low-value

users.

4.4 Inefﬁciency

The social surplus of strategy proﬁle (β, a, p) is deﬁned analogously to the equilibrium (social)

surplus in the previous section as

Social surplus(β, a) = E

b∼β





i∈J



(1 − v

)) + c

)



i∈J



(1 − v

)) + c

)







where the sets J

and J

are deﬁned based on random variable b ∼ β.

A natural conjecture is that competition might redress some of the inefﬁciencies of data mar-

kets identiﬁed so far, either by increasing data prices or by allowing high-value users to go to a

platform where less of their information will be leaked. The next example illustrates that this is

not necessarily the case and competition may increase or reduce equilibrium surplus.

Example 6. Consider a setting with V = {1, 2}, σ

= σ

= 1, Σ

> 0, and v

< 1, and take c

be a constant function c for all i.

• Competition improves equilibrium surplus: Suppose that v

> 1 is sufﬁciently large that in equi-

librium user 2 never shares her data. Under monopoly, equilibrium data sharing decisions

are a

= 1 and a

= 0 with prices given in Theorem 2. The equilibrium surplus is

(1 − v

= 1, a

= 0) + (1 − v

= 1, a

= 0) + 2c.

With competition, equilibrium joining decisions are b

= 1, b

= 2 and equilibrium data

sharing decisions are a

= 1 and a

= 0 with prices given in equation (8). The equilib-

rium surplus is therefore

(1 − v

= 1, a

= 0) + 2c,

which is strictly greater than the equilibrium surplus under monopoly.

• Competition reduces equilibrium surplus: Suppose that v

< 1. Under monopoly, we have

= 1 and a

= 1 with prices given in Theorem 2. The equilibrium surplus is

(1 − v

= 1, a

= 1) + (1 − v

= 1, a

= 1) + 2c.

With competition, the equilibrium involves b

= 1, b

= 2, a

= 1 and a

= 1 with

prices given in equation (8). The equilibrium surplus is

(1 − v

= 1, a

= 0) + (1 − v

= 0, a

= 1) + 2c,

which is strictly less than the surplus under monopoly.

The next theorem, which is the analogue of Theorem 3, characterizes the conditions for efﬁ-

ciency and inefﬁciency in this case.

Theorem 6. 1. Suppose every high-value user is uncorrelated with all other users. Then the equilib-

rium is efﬁcient if and only if c

(V) − c

({i}) ≥ v

(l)

\ {i}) for all i ∈ V

(l)

2. Suppose at least one high-value user is correlated (has a non-zero correlation coefﬁcient) with a low-

value user. Then there exist

v ∈ R

(h)

and v ∈ R

(l)

such that when v

(h)

≥

v and v

(l)

≥ v the

equilibrium is inefﬁcient.

3. Suppose every high-value user is uncorrelated with all low-value users and at least one high-value

user is correlated with another high-value user. Let

(h)

⊆ V

(h)

be the subset of high-value users

correlated with at least one other high-value user. Then for each i ∈

(h)

there exists ¯v

> 0 such that

if for any i ∈

(h)

< ¯v

, the equilibrium is inefﬁcient.

The results in this theorem are similar to those in Theorem 3 and again establish that the equi-

librium is generally inefﬁcient. One important difference is that the condition for efﬁciency in

the ﬁrst part has become more stringent with the addition of a restriction on direct beneﬁts from

joining platforms with different subsets of users. This is because, in the presence of multiple plat-

forms, not all low-value users may end up joining the same platform since they may try to avoid

their information being leaked to the platform by other low-value users (recall, as noted in foot-

note 12, that information leakage about low-value users is socially beneﬁcial but costly for them

as it reduces the prices that the platforms pay for their data). This fragmented allocation of users

across platforms is costly if there are gains from being on larger platforms.

The second part of the theorem is also slightly different from Theorem 3 and requires that low-

value users have some privacy concerns as well. This is to rule out the possibility that low- and

high-value users go to separate platforms, and such an allocation may maximize social surplus

despite the loss of direct beneﬁts from forming a larger network on a single platform. However,

when low-value users care about their privacy, such an allocation cannot be sustained because

they will have an incentive to switch to the platform populated by high-value users where there

will be no information leaked about them and they will therefore be able to obtain higher prices

for their data.

The next proposition generalizes Proposition 4 and provides an upper bound on social surplus,

demonstrating that data markets may easily generate negative social surplus, even in the presence

of competition. Since our focus is on the implications of data market, we do not include the direct

beneﬁts, the c

’s, from online platforms in this social surplus (since these can be enjoyed by users

even if there are no data markets), and refer to it as the “data social surplus”.

Proposition 6. Let β

be the uniform mixed joining strategy. Then

Data social surplus(β

, a

) ≤

i∈V

(l)

(1 − v

(V) −

i∈V

(h)

− 1)I

(l)

We should add that for such a fragmented allocation of users across platforms to achieve efﬁciency, though possible,

is very difﬁcult in general.

Value of privacy for high-value users: v

12345678910

Cross-community correlation: ρ

0.05

0.1

0.15

0.2

Figure 4: Shaded area shows the pairs of (ρ, v

) with negative equilibrium surplus in the setting

of Example 7.

Therefore, Proposition 6 implies that if

i∈V

(h)

− 1)I

(l)

) ≥ 2

i∈V

(l)

(1 − v

(V),

then shutting down the market for data (while continuing to operate the other services on the

online platforms) is beneﬁcial.

Note also that compared to Proposition 4, there is a 1/2 in front of the second term in data

social surplus. This reﬂects the fact that in the mixed strategy joining equilibrium, the set of low-

value users who will be on the same platform as a high-value user is random (and will typically be

less than the entire set of low-values users). Using the submodularity of leaked information, we

show that the expected leakage of user i’s information from the data sharing of low-value users is

greater than 1/2 times the total leaked information about this user when all low-value users are

on the same platform and share their data.

The following example provides a more explicit illustration of the results in Proposition 6.

Example 7. We consider the same setting as Example 4 but with two competing platforms. Figure

4 shows the combinations of v

and ρ for which the data social surplus is negative. As highlighted

in Proposition 6, when the value of privacy for high-value users is high and/or when the correla-

tion between high-value and low-value users is large, the data social surplus is negative.

4.5 Between-Platform Externalities

In this subsection, we generalize our model to allow for between-platform externalities. This

captures the fact that some of the loss of privacy may occur for people who are not on the platform,

but whose information is revealed to the platform by others’ data sharing.

To model this possibility, we now generalize the payoff of users who join platform k ∈ {1, 2}

, a

−i

, p

) =











− v



= 1, a

−i



− v

αI





+ c

), a

= 1, k

6= k

−v



= 0, a

−i



− v

αI





+ c

) a

= 0, k

6= k,

where the term αI





is leaked information about user i on the platform she has not joined,

and the payoff of users who have not joined either of the platforms is

−v

αI





− v

αI





In this speciﬁcation α ∈ [0, 1] captures between-platform externalities. When α = 0, we obtain the

model studied so far in this section. When α = 1, information revealed about an individual who

is not on the platform creates the same loss of privacy.

In addition, we assume that platforms also beneﬁt in the same manner from information leaked

about individuals who are not their current users, so the payoff of platform k ∈ {1, 2} is now

(k)

, a

, p

) =

i∈J

) + α

i∈J

) −

i∈J

: a

, k

6= k, (10)

where

i∈J

) is leaked information about users on the other platform.

We next show that our main results generalize to this case. For brevity, equilibrium existence

and characterization are provided in Theorem B-1 in Appendix B, which also shows that the equi-

librium price vectors are the same as in equation (8) in Theorem 4 (because information leaked

about a user on the other platform is unaffected by her data sharing decision).

Theorem 7. 1. Suppose that every high-value user is uncorrelated with all other users. Then there

exists ¯α such that for α ≤ ¯α the equilibrium is efﬁcient if and only if c

(V) − c

({i}) ≥ (1 −

α)I

(l)

\ {i}) for all i ∈ V

(l)

2. Suppose that at least one high-value user is correlated (has a non-zero correlation coefﬁcient) with a

low-value user. Then there exists

v ∈ R

(h)

such that for v

(h)

≥

v the equilibrium is inefﬁcient.

3. Suppose every high-value user is uncorrelated with all low-value users and at least one high-value

user is correlated with another high-value user. Let

(h)

⊆ V

(h)

be the subset of high-value users

correlated with at least one other high-value user. Then for each i ∈

(h)

there exists ¯v

> 0 such that

if for any i ∈

(h)

< ¯v

, the equilibrium is inefﬁcient.

There are a number of differences from Theorem 6. First, in addition to the conditions in the

ﬁrst part of Theorem 6, we now also require the between-platform spillovers not to be too large

— otherwise, the ﬁrst best may involve splitting low-value users across the two platforms. In ad-

dition, the conditions for inefﬁciency in the second part are slightly weaker because the between-

platform externalities make it more likely that the information of high-value users will be leaked

inefﬁciently.

Finally, the next result shows that the data social surplus can again be negative under plausible

conditions.

Proposition 7. Let β

be the uniform equilibrium mixed joining strategy. Then

Data social surplus(β

, a

) ≤ (1 + α)

i∈V

(l)

(1 − v

(V) −

1 + α

i∈V

(h)

− 1)I

(l)

Proposition 7 implies that if

i∈V

(h)

− 1)I

(l)

) ≥ 2

i∈V

(l)

(1 − v

(V),

then shutting down the market for data (while continuing to operate the other services provided

by online platforms) is once again beneﬁcial.

4.6 Competition over Data Prices

As noted at the beginning of this section, by having users join platforms ﬁrst, we have removed

the ability of online companies to set data prices in order to attract users to their platform. In this

subsection, we study this case (with no between-platform spillovers for simplicity). Existence of

equilibrium becomes more challenging in this case because of ﬁercer competition over data and

users, but the overall insights are similar.

The exact timing of events is as follows.

1. Platforms simultaneously offer price vectors p

∈ R

and p

∈ R

2. Users simultaneously decide which platform, if any, to join, i.e., b ={b

}

i∈V

(which deter-

mines J

and J

) and whether to share their data.

The payoffs of users and platforms are the same as before. In particular, the payoff of user

i ∈ V who joins platform b

∈ {1, 2} is

, b

, a

−i

, b

−i

, p

) =







− v

) + c

), a

= 1,

−v

) + c

), a

= 0,

where recall that a

denotes the vector of sharing decisions in the set J

for k = 1, 2. Assumption

1 ensures that every user joins one of the platforms. The payoff of platform k ∈ {1, 2} is

(k)

, p

, a, b) =

i : b

) −

i : b

=k,a

Deﬁnition 4. For given price vectors p

∈ R

and p

∈ R

, the joining and sharing proﬁles b and

a constitute a user equilibrium if for any i ∈ V, we have

, b

) ∈ argmax

a,b

(a, b, a

−i

, b

−i

, p

Let A(p

, p

) denote the set of user equilibria for given price vectors p

and p

. Price vectors p

1,E

2,E

, joining proﬁle b

, and sharing proﬁle a

constitute a pure strategy equilibrium if (a

, b

) ∈

A(p

1,E

, p

2,E

) and for any p, there exists (a, b) ∈ A(p, p

2,E

) such that

(1)

1,E

, p

2,E

, a

, b

) ≥ U

(1)

(p, p

2,E

, a, b),

and there exists (a

, b

) ∈ A(p

1,E

, p) such that

(2)

1,E

, p

2,E

, a

, b

) ≥ U

(2)

1,E

, p, a

, b

We next deﬁne a mixed strategy equilibrium similarly to before, except that these strategies

will deﬁne probability distributions over price vectors for the platforms and user actions.

Deﬁnition 5. Let P be the set of probability measures over R

. For any user i ∈ V, let A

be the set

of probability measures over {0, 1} and B

be the set of probability measures over {1, 2}. For given

price vectors p

∈ R

and p

∈ R

, the joining and sharing proﬁles β ∈

i∈V

and α ∈

i∈V

constitute a mixed user equilibrium if for any i ∈ V, we have

(α

, β

, α

−i

, β

−i

, p

) ≥ u

(α

, β

, α

−i

, β

−i

, p

), for all α

∈ A

and β

∈ B

where u

(α

, β

, α

−i

, β

−i

, p

) = E

∼α

−i

∼α

−i

∼β

−i

∼β

−i



, b

, a

−i

, b

−i

, p

)



. We let

A(π

, π

) denote the set of mixed strategy user equilibria for given price strategies π

and π

Strategy price proﬁles π

k,E

∈ P, k = 1, 2, joining proﬁle β

, and sharing proﬁle α

constitute a

mixed strategy equilibrium if (α

, β

) ∈ A(π

1,E

, π

2,E

) and for any π ∈ P there exists (α, β) ∈

A(π, π

2,E

) such that

∼π

1,E

∼π

2,E

,a∼α

,b∼β

(1)

, p

, a, b)

≥ E

∼π,p

∼π

2,E

,a∼α,b∼β

(1)

, p

, a, b)

and there exists (α

, β

) ∈ A(π

1,E

, π) such that

∼π

1,E

∼π

2,E

,a∼α

,b∼β

(2)

, p

, a, b)

≥ E

∼π

1,E

∼π,a∼α

,b∼β

(2)

, p

, a, b)

Note that in the models analyzed so far, after the platform (or platforms) set data prices, users no longer had the

option of switching to another platform, and we focused on the Stackelberg equilibrium where the platform set prices

anticipating user choices and selected the most advantageous user equilibrium for itself (when there were multiple

user equilibria). This meant that an equilibrium data price ensured a (weakly) greater payoff for the platform than any

other price for any other user equilibrium. Because users now make their joining decisions after price offers, we use

the standard Nash equilibrium notion and require that for each platform and any other price than its equilibrium price

there exists a user equilibrium in which the platform’s payoff is no greater than its equilibrium payoff.

Theorem B-2 in Appendix B establishes the existence of a mixed strategy equilibrium in this

case. Here, we characterize the more pervasive inefﬁciencies with this type of price competition.

We next show that the equilibrium is even more likely to be inefﬁcient when platforms com-

pete using data prices. In particular, in contrast to the settings studied so far, the equilibrium is

inefﬁcient not only when high-value users are correlated with other users, but also when there is

correlation only among low-value users. For this theorem, let us deﬁne:

δ = min

i,T ⊂V

(V) − c

(T ) and ∆ = max

i,T ⊆V

(V) − c

(T ).

Theorem 8. 1. Suppose every user is uncorrelated with all other users. Then the equilibrium is efﬁcient.

2. Suppose that every high-value user is uncorrelated with all other users, but at least two low-value

users are correlated with each other. Then there exist δ,

∆,

v, and

v such that:

2-1) If δ ≥ δ, the equilibrium is efﬁcient.

2-2) If ∆ ≤

∆ and v

(l)

≤

v, the equilibrium is efﬁcient.

2-3) If ∆ ≤

∆ and v

(l)

≥

v, the equilibrium is inefﬁcient.

3. Suppose that at least one high-value user is correlated with a low-value user. Then there exist

δ >

∆ >

δ > 0,

v ∈ R

(h)

, and v ∈ R

(l)

such that:

3-1) If v

(h)

≥

v, v

(l)

≥ v, ∆ ≤

∆, and δ ≥

δ, the equilibrium is inefﬁcient.

3-2) If δ ≥

δ, the equilibrium is efﬁcient.

The ﬁrst part is straightforward: without correlation there is no data externality, ensuring

efﬁciency.

The second part is new relative to our previous results: now the equilibrium is inefﬁcient even

when high-value users are uncorrelated with all other users. This inefﬁciency is caused by com-

petition using data prices. Since there is no correlation between high-value and low-value users,

the ﬁrst best involves all low-value users sharing their data and all (high-value and low-value)

users joining the same platform in order to beneﬁt from the highest joining values. However, we

show in part 2.3 that such an allocation is not an equilibrium, because the other platform can at-

tract some of the low-value users who can beneﬁt by having less of their information leaked by

other low-value users (even though information leakage about these users is socially beneﬁcial,

it is privately costly for them). This leads to a fragmented distribution of users across platforms,

leading to inefﬁciency. Parts 2.1 and 2.2 provide conditions for efﬁciency in terms of the c function

being sufﬁciently steep or the privacy concerns of low-value users being sufﬁciently weak.

Part 3.1 of the theorem is similar to our other inefﬁciency results. In this case, in the ﬁrst

best all users join the same platform (because the c function is sufﬁciently steep), but only low-

value users uncorrelated with high-value users share their data (because v

(h)

is sufﬁciently high).

We show, however, that this allocation cannot be an equilibrium because the other platform can

deviate and attract a subset of low-value users and induce them to share their data. In part 3.2

the ﬁrst best is, once again, for all users to join one of platforms. But now because the joining

values are even steeper, the other platform can no longer attract a subset of these users, while the

threat of all users switching to this other platform supports the ﬁrst-best allocation (though there

also exist inefﬁcient equilibria in this case). Finally, we show in the working paper version that

when high-value users are uncorrelated with low-value users but correlated among themselves,

the equilibrium may or may not be efﬁcient.

5 Unknown Valuations

Our analysis has so far assumed that platforms know the value of privacy of different users. In

this section, we adopt the more realistic assumption that they do not know the exact valuations

of users, but understand that the value of privacy of user i, v

, has a distribution represented by

the cumulative distribution function F

and density function f

(with upper support denoted by

max

). Users know their own value of privacy. For simplicity, we focus on the case of a monopoly

platform and show how the platform can design a mechanism to elicit this information (in the

form of users reporting their value of privacy). We then show that all of the main insights from

our analysis generalize to this case.

We ﬁrst characterize the “second best” which takes into account that the value of privacy of

each user is their private information, and then show that the second best coincides with the ﬁrst

best.

Proposition 8. Let v be the reported vector of values of privacy. Then the pricing scheme

(v) =





(a(v)) +

j6=i

(1 − v

(a(v))





− min

a∈{0,1}





(a) +

j6=i

(1 − v

(a)





where a(v) = argmax

a∈{0,1}

i∈V

(1 − v

(a) incentivizes users to report their value of privacy truth-

fully, and thus the second best coincide with the ﬁrst best.

This mechanism is a variation of Vickery-Clarke-Grove mechanism (Vickrey [1961], Clarke

[1971], Groves [1973]). In particular, for any i ∈ V the price offered to user i is equal to the surplus

of all other users on the platform when user i is present minus by the surplus when user i is absent.

The second term in the price p

(v) can be any function of the values v

−i

, and the choice speciﬁed

in Proposition 8 guarantees that the prices are nonnegative.

The next deﬁnition generalizes our notion of equilibrium to this incomplete information setup.

It is simpliﬁed by making use of the revelation principle, which enables us to focus on incentive

compatible price and action proﬁles.

Deﬁnition 6. An equilibrium is a pair (a

, p

) of functions of the reported valuations v =

, . . . , v

) such that each user reports its true value and the expected payoff of the platform

is maximized. That is,

, p

) = max

a:R

→{0,1}

,p:R

→R





i=1

(a(v)) −

i: a

(v)=1

(v)





(v) − v

(a(v)) ≥ p

−i

, v

) − v

(a(v

−i

, v

)), for all v

, v, and i ∈ V.

We also impose the following standard assumption on the (reversed) hazard rate and maintain

it throughout this section without explicitly mentioning it.

Assumption 2. For all i ∈ V, the function Φ

(v) = v +

(v)

is nondecreasing.

Here Φ

(v) is the well-known “virtual value” in incomplete information models, representing

the additional rent that the agent will capture in incentive-compatible mechanisms. In our setting,

it will enable users to obtain more of the surplus the platform infers from their data.

A sufﬁcient condition for Assumption 2 to hold is for the reversed hazard rate f

(x)/F

(x) to

be nonincreasing. This requirement is satisﬁed for a variety of distributions such as uniform and

exponential (see e.g., Burkschat and Torrado [2014]).

Theorem 9. For any reported vector of values v, the equilibrium is given by

(v) = argmax

a∈{0,1}

i=1

(1 − Φ

))I

(a) + Φ

−i

, a

= 0),

and

) =

max



(x, v

−i

)) − I

−i

(x, v

−i

), a

= 0)



+ v



, v

−i

)) − I

−i

, v

−i

), a

= 0)



Moreover, all users report truthfully and thus the expected payoff of the platform is

max

a∈{0,1}

i=1

(1 − Φ

))I

(a) + Φ

−i

, a

= 0)

We next establish that the equilibrium is inefﬁcient under fairly plausible conditions in this

incomplete information setting as well. The main difference from our analysis so far is that an-

other relevant set is the subset of low-value users with virtual value of privacy less than one, i.e.,

) ≤ 1. For the next theorem, we use the notation V

(l)

= {i ∈ V : Φ

) ≤ 1} to denote this set

of users.

Theorem 10. 1. Suppose high-value users are uncorrelated with all other users and V

(l)

= V

(l)

. Then

the equilibrium is efﬁcient.

2. Suppose some high-value users (those in V

(h)

) are correlated with users in V

(l)

. Then there exists

v ∈ R

(h)

such that for v

(h)

≥

v the equilibrium is inefﬁcient.

3. Suppose every high-value user is uncorrelated with all users in V

(l)

, but users in a nonempty subset

(l)

of V

(l)

\ V

(l)

are correlated with at least one high-value user. Then there exist

v and ˜v such that

if v

(h)

≥

v and v

< ˜v for some i ∈

(l)

, the equilibrium is inefﬁcient.

4. Suppose every high-value user is uncorrelated with all low-value users and at least one high-value

user is correlated with another high-value user. Let

(h)

⊆ V

(h)

be the subset of high-value users

correlated with at least one other high-value user. Then for each i ∈

(h)

there exists ¯v

> 0 such that

if for any i ∈

(h)

< ¯v

, the equilibrium is inefﬁcient.

The inefﬁciency results in this theorem again have clear parallels to those in Theorem 3, but

with some notable differences. First, efﬁciency now requires all low-value users to also have vir-

tual valuations that are less than one, since otherwise user incentive compatibility constraints pre-

vent the efﬁcient allocation. Second, the conditions for inefﬁciency are slightly different depending

on whether high-value users are correlated with low-value users whose virtual valuations are less

than one or greater than one.

Finally, we present the analogue of Proposition 4 in this setting. First, note that for a given

vector of user values v a similar argument to that of Lemma 3 shows that a user i with Φ

) ≤ 1

will share her data in equilibrium. Using this observation, we can establish the next proposition.

Proposition 9. Consider a setting with unknown valuations. For a given v, we have

Social surplus(a

) ≤

i : v

∈V

(l)

(1 − v

(V) −

i : v

∈V

(h)

− 1)I

(l)

Proposition 9 is similar to Proposition 4 and provides a sufﬁcient condition for equilibrium

surplus to be negative. The only difference is that the lower bound on the negative (second) term

is evaluated for information leaked by users in V

(l)

(rather than those in V

(l)

). This is because, with

incomplete information, the platform has to compensate users according to their virtual value of

privacy and may ﬁnd it too expensive to purchase the data of low-value users with Φ

) > 1.

Indeed, because Φ

) ≥ v

, we have V

(l)

⊆ V

(l)

, and thus, given v, if equilibrium surplus with

unknown valuations is negative (

i∈V

(h)

− 1)I

(l)

) >

i∈V

(l)

(1 − v

(V)), then equilibrium

surplus with known valuations (as given in Proposition 4) is also negative.

6 Regulation

The inefﬁciencies documented so far raise the question of whether certain types of government

policies or regulations could help data markets function better. We brieﬂy address this question in

this section. We ﬁrst discuss taxes and then turn to a regulation scheme based on “de-correlation”

to reduce the informativeness of the data of users about others. For simplicity, we focus on the

case of a single platform with complete information.

6.1 Taxation

The next proposition shows that a simple Pigovian tax scheme, using personalized taxes on data

transactions, can restore the ﬁrst best. For this purpose we assume that the government can im-

pose a tax t

on user i selling her data to the platform.

Proposition 10. Let a

denote the ﬁrst best. Then personalized taxes satisfying

j∈V

(l)

j∈V

(h)

for a

= 0

= 0 for a

= 1,

implements the ﬁrst-best action proﬁle a

as the unique equilibrium.

The idea of the proposition is straightforward. We ﬁrst prove that not taxing users who should

be sharing in the ﬁrst best is sufﬁcient to ensure that they share in the post-tax equilibrium as well

regardless of the sharing decisions of the rest of the users. Then imposing prohibitive taxes on the

data transactions of users who should not be sharing implements the ﬁrst best.

Proposition 10 characterizes a set of Pigovian taxes implementing the ﬁrst best, but these taxes

vary across individuals, which presupposes a huge amount of information on the part of the plan-

ner/tax authority. A natural question is whether a uniform tax scheme can also improve over the

equilibrium allocation. If equilibrium surplus is negative, then a uniform and sufﬁciently high tax

on data transactions can shut down the data market and improves equilibrium surplus. The next

example, however, shows that, beyond this simple case with negative equilibrium surplus, there

is no guarantee that uniform taxes on data transactions improve welfare. This is because such

taxes may prevent beneﬁcial data trades.

Example 8. Consider a network with three users and suppose that σ

= 1 for all i ∈ {1, 2, 3}, the

correlation between X

and the other two random variable is 0.4, and the correlation between X

and X

is 0. We also set v

= 0.1, v

= 0.5, and v

= 8. The ﬁrst best is a

= (0, 1, 0). If the uniform

tax is t ≤ 0.25, the equilibrium is a

= (1, 1, 0) whose surplus is 0.0625. For 0.25 < t < 0.61, the

equilibrium is a

= (1, 0, 0), with social surplus −0.15. Finally, for t ≥ 0.61, the equilibrium is

= (0, 0, 0) with surplus 0. Therefore, no uniform taxation scheme can improve equilibrium

surplus in this game.

Intuitively, when the tax is small, user 1 continues to share her data. When the tax rate takes

intermediate value, now both users 1 and 2 share, and this leaks considerable information about

user 3. With very high taxes, nobody shares their data. In none of these cases can we implement

the allocation where only user 2 shares.

6.2 Mediated Data Sharing and De-correlation

In this subsection, we propose a different approach to improving the efﬁciency of data markets.

Our analysis has clariﬁed that a main source of inefﬁciency in such markets are the data external-

ities created by the correlation between the information/types of different users. Our approach

is based on the observation that it is possible to transform the data of different users in such a

manner as to remove the correlation that is at the root of these data externalities. We refer to such

a scheme as de-correlation.

Suppose that instead of sharing their data with the platform, users share their data with a

(trusted third-party) mediator, who can either not share these data with the platform (as in-

structed) or transform them before revealing them to the platform.

Recall that user i’s data

are represented by S

= X

+ Z

. The main idea is that the mediator collects all the data from the

users and then computes transformed variables for each user removing the correlation with the

information of other users and only shares the transformed data of those who are willing to sell

their data (but utilizes the data of others for removing the correlation with their information).

Formally, we consider the following de-correlation scheme:

S = Σ

−1

S where S = (S

, . . . , S

)

is the vector of data of all users. Clearly,

S is jointly normal and has the property that if user i does

not share her data, then the data of other users leak no information about user i’s type. This is

formally stated in the next lemma.

Lemma 5. With de-correlation, for any action proﬁle a ∈ {0, 1}

, the leaked information about user i is

(a) = σ

− min

ˆx





− ˆx













0, a

= 0,

, a

−i

), a

= 1.

This lemma clariﬁes our claim in the Introduction and shows that the de-correlation scheme

leaves information leaked about the user sharing her data the same, but removes the leakage about

users who are not sharing their data.

We next characterize the equilibrium pricing, denoted by

, and sharing proﬁle, denoted by

, with this transformation, and show that, with de-correlation, there is no information leakage

about those who do not share, and therefore they do not contribute to the platform’s payoff. More-

over, the price offered to users who share must make them indifferent between sharing and not

sharing and thus give them zero payoff (which they can guarantee by not sharing). Given this

characterization, it follows that de-correlation always improves equilibrium surplus and, more-

over, eliminates cases where the social surplus is negative.

Theorem 11. 1. The equilibrium sharing proﬁle after de-correlation is given by

= argmax

a∈{0,1}

i∈V

(1 − v

)

(a),

with prices ˜p

= v

(

) for any i ∈ V such that ˜a

= 1.

Obviously, such a scheme can only work if the mediator is fully reliable and trusted, and this is an important

constraint in practice, which we are ignoring in this paper.

In practice, it may be more relevant to remove the correlation between a user’s data and the average data of different

user types. In that case, we can partition the set of users into K cells and apply this de-correlation procedure to the

average data of cells.

2. Let (

) and (a

, p

) denote the equilibrium with and without the de-correlation scheme, respec-

tively. Then

Social surplus(

) ≥ max



Social surplus(a

), 0



That equilibrium surplus increases after de-correlation is a consequence of the fact that in the

original equilibrium the contribution of high-value users (who do not share) to social surplus is

less than or equal to zero, while after de-correlation their contribution to social surplus is greater

than or equal to zero.

Moreover, because there are no users with negative contribution to social

surplus after de-correlation, equilibrium surplus is always positive. This observation also implies

that the de-correlation scheme outperforms policies that shut down data markets — since instead

of achieving zero equilibrium surplus by shutting down these markets, e.g., as in Proposition 4,

this scheme always guarantees positive social surplus.

Open questions include whether such de-correlation schemes are excessively complex to be

implemented in practice and how non-mediated information sharing between platforms and users

can be prevented (since otherwise platforms can partially undo the de-correlation implemented

by the mediator).

7 Conclusion

Because data generated by economic agents are useful for solving economic, social, or technical

problems facing others in society and for designing or inventing new products and services, much

of economic analysis in this area argues that the market may produce too little data. This paper

develops the perspective that, in the presence of privacy concerns of some agents, the market may

generate too much data. Moreover, because the data of a subset of users reveal information about

other users, the market price of data tends to be depressed, creating the impression that users do

not value their privacy much. The depressed market price of data and excessive data generation

are intimately linked.

We exposit these ideas in a simple model in which a platform wishes to estimate the types of

a collection of users, and each user has personal data (based on their preferences, past behavior,

and contacts) which are correlated both with their type and with the data and types of other users.

As a result, when a user decides to share her data with the platform, this enables the platform to

improve its estimate of other users’ types. We model the market for data by allowing the platform

to offer prices (or other services) in exchange of data.

We prove the existence of an equilibrium in the data market and show that there will be too

much data shared on the platform and the price of data will be excessively depressed. The result

that the platform acquires too much data is a direct consequence of the externalities from the data

of others. The root cause of depressed data prices is the submodularity of leaked information:

As with personalized taxes, de-correlation involves a considerable amount of information being pooled in the

hands of a centralized body. The difference, however, is that de-correlation, by ensuring that no information is leaked

about users who do not want to share their data, makes such information pooling incentive compatible. Providing

information to regulatory authorities is typically not incentive compatible.

when data sharing by other users already compromises the information of an individual, she has

less incentive to protect her data and privacy. We further show that under some simple condi-

tions the social surplus generated by data markets is negative, meaning that shutting down data

markets improves (utilitarian) social welfare.

We extend these results to a setting with multiple platforms. Various different types of com-

petition between platforms do not alter the fundamental forces leading to too much data sharing

and excessively low prices of data. In fact, competition may make inefﬁciencies worse. This is in

part because more data may be shared in the presence of competition, and also because the desire

of some users to avoid excessive data sharing about them may lead to an inefﬁciently fragmented

distribution of users across platforms, even when network externalities would be better exploited

by having all users join the same platform. We also extend these results to a setting in which the

value of privacy of different users are their private information.

Excessive data sharing may call for policy interventions to correct for the externalities and

the excessively low prices of data. Individual-speciﬁc (Pigovian) taxes on data transactions can

restore the ﬁrst best. More interestingly, we propose a scheme based on mediated-data sharing that

can improve welfare. In particular, in our baseline model, when equilibrium surplus is negative,

shutting down data markets, for example with high uniform taxes on all data transactions, would

improve welfare. But this prevents the sharing of the data of users with low value of privacy or

high beneﬁts from goods and services that depend on the platform accessing their data. We show

that if user data are ﬁrst shared with a mediator which transforms them before revealing them

to the platform, the correlation of the data with the information of privacy-conscious users can

be eliminated, and this would improve welfare relative to the option of shutting off data markets

altogether.

We view our work as part of an emerging literature on data markets and the economics of

privacy. Several interesting areas of research are suggested by our results. First, it is important

to develop models of the marketplace for data that allow for richer types of competition between

different platforms. Second, our modeling of privacy and the use of data by the platform has been

reduced-form. Distinguishing the uses of personal data for price discrimination, advertising, and

designing of new products and services could lead to additional novel insights. For example, it

may enable an investigation of whether applications of personal data for designing personalized

services can be unbundled from their use for intrusive marketing, price discrimination, or mis-

leading advertising. Third, we only touched upon the possibility of designing new mechanisms

for improving the functioning of data markets while reducing data externalities. Our proposed

mechanism can be simpliﬁed and made more practical, for example, by aiming to remove the cor-

relation between different user classes, as noted above, or by focusing on only some types of data.

Other mediated data sharing arrangements or completely new approaches to this problem could

be developed as well, but should take into account the possibility that third parties may not be

fully trustworthy either. Finally, our result that market prices, or current user actions for protect-

ing privacy, do not reveal the value of privacy highlights the need for careful empirical analysis

documenting and estimating the value of data to platforms and the value that users attach to their

privacy in the presence of data externalities.

Appendix A

In this part of the Appendix, we provide some of the proofs omitted from the text. Remaining

proofs are presented in the online Appendix B and the details of several of the examples discussed

in the text are included in the working paper version.

Proof of Proposition 1

Recall that a

denotes the ﬁrst best. For any i ∈ V we have a

= 1 if and only if Social surplus(a

−i

, a

1) ≥ Social surplus(a

−i

, a

= 0). Substituting the expression for the social surplus into this equa-

tion yields

j∈V

(1 − v

)



−i

, a

= 1) − I

−i

, a

= 0)



≥ 0. (A-1)

Conditional on the data provided by other users, i.e., k 6= i for which a

= 1, (X

, X

) are jointly

normal and their covariance matrix is given by

− I

−i

, a

= 0) Cov(X

, X

| a

−i

, a

= 0)

Cov(X

, X

| a

−i

, a

= 0) 1 + σ

− I

−i

, a

= 0)

Therefore, if in addition to users k 6= i for which a

= 1, user i also shares her data, then the

leaked information of user j becomes

−i

, a

= 1) = I

−i

, a

= 0) +

Cov(X

, X

| a

−i

, a

= 0)

1 + σ

− I

−i

, a

= 0)

. (A-2)

Substituting equation (A-2) into equation (A-1) completes the proof. 

Proof of Lemma 1

Part 1, Monotonicity: In order to show that leaked information is monotonically increasing in the

set of users who share, it sufﬁces to establish that for any i, j ∈ V and a

−j

∈ {0, 1}

n−1

we have

= 1, a

−j

) ≥ I

= 0, a

−j

). We next consider the two possible cases where i = j and i 6= j

and show this inequality.

• i = j: conditional on shared data, the joint distribution of (X

, S

) is normal with covari-

ance matrix

ˆσ

1 + ˆσ

, where ˆσ

= E[X

| a

−j

]. We have I

= 1, a

−i

) = σ

−



ˆσ

−

ˆσ

1+ˆσ



≥ σ

− ˆσ

= I

= 0, a

−i

), completing the proof of this part.

• i 6= j: conditional on shared data, the joint distribution of (X

, S

) is normal with covariance

matrix

ˆσ

1 + ˆσ

, where ˆσ

= E[X

| a

−j

], ˆσ

= E[X

| a

−j

], and

= E[X

| a

−j

We have I

= 1, a

−j

) = σ

−



ˆσ

−

1+ˆσ



≥ σ

− ˆσ

= I

= 0, a

−j

), completing the

proof of the monotonicity.

Part 2, Submodularity: We ﬁrst introduce some additional notation for this proof. For any pair

i, j ∈ V, a

−{i,j}

is the collection of all users’ actions except for user i and user j. To prove this part,

it sufﬁces to establish that for any a

−{i,j}

∈ {0, 1}

n−2

, we have

−{i,j}

, a

= 1, a

= 0) − I

−{i,j}

, a

= 0, a

= 0)

≥ I

−{i,j}

, a

= 1, a

= 1) − I

−{i,j}

, a

= 0, a

= 1).

Conditional on a

−{i,j}

, (X

, S

) has a normal distribution with covariance matrix







ˆσ

1 + ˆσ







where ˆσ

= E[X

| a

−{i,j}

], ˆσ

= E[X

| a

−{i,j}

], and

= E[X

| a

−{i,j}

]. Note that in

writing this matrix, we are using the fact that the correlation between X

and S

is the same as the

correlation between S

and S

(this holds because S

= X

+ Z

for some independent noise Z

Based on this covariance matrix,

−{i,j}

, a

= 1, a

= 0) − I

−{i,j}

, a

= 0, a

= 0) =

−

ˆσ

−

ˆσ

1 + ˆσ

−



− ˆσ



ˆσ

1 + ˆσ

. (A-3)

We also have

−{i,j}

, a

= 1, a

= 1) − I

−{i,j}

, a

= 0, a

= 1)





−





ˆσ

− (ˆσ

)

1 + ˆσ

−1

(ˆσ

)









−

ˆσ

−

1 + ˆσ

ˆσ

(1 + ˆσ

) +

(1 + ˆσ

) − 2

ˆσ

(1 + ˆσ

)(1 + ˆσ

) −

−

1 + ˆσ

. (A-4)

Comparing (A-3) and (A-4), the submodularity of leaked information becomes equivalent to ˆσ

(1+

ˆσ

) +

(1 + ˆσ

) ≤ 2ˆσ

(1 + ˆσ

)(1 + ˆσ

), which holds because

ˆσ

(1 + ˆσ

) +

(1 + ˆσ

) ≤ ˆσ

(1 + ˆσ

)(1 + ˆσ

) +

(1 + ˆσ

)

(a)

≤ ˆσ

(1 + ˆσ

)(1 + ˆσ

) + ˆσ

ˆσ

(1 + ˆσ

) ≤ ˆσ

(1 + ˆσ

)(1 + ˆσ

) + ˆσ

(1 + ˆσ

)(1 + ˆσ

) = 2ˆσ

(1 + ˆσ

)(1 + ˆσ

where (a) follows from

≤ ˆσ

ˆσ

. 

Proof of Lemma 2

Using Lemma 1, we ﬁrst establish that the game is supermodular. The rest of the proof follows

from Tarski’s ﬁxed point theorem. Speciﬁcally, for any i ∈ V, we prove that the game has increas-

ing differences property in (a

, a

−i

), i.e., if a

−i

≥ a

−i

then we have

= 1, a

−i

) − u

= 0, a

−i

) ≥ u

= 1, a

−i

) − u

= 0, a

−i

We can write

= 1, a

−i

) − u

= 0, a

−i

) = p

− v



= 1, a

−i

) − I

= 0, a

−i

)



(a)

≥ p

− v

= 1, a

−i

) − I

= 0, a

−i

)) = u

= 1, a

−i

) − u

= 0, a

−i

where inequality (a) follows from part 2 of Lemma 1. Now consider the mapping F : {0, 1}

→

{0, 1}

where F

(a) = argmax

a∈{0,1}

(a, a

−i

). Using supermodularity of the game, this mapping

is order preserving and therefore Tarski’s theorem establishes that its ﬁxed points form a complete

lattice and therefore is non-empty and has greatest and least elements. Finally, note that each ﬁxed

point of the mapping F is a user equilibrium and vice versa. Therefore, the set of ﬁxed points of

the mapping F is exactly the set of user equilibria denoted by A(p). 

Proof of Theorem 1

We prove that the following action proﬁle and price vector constitute an equilibrium:

= argmax

a∈{0,1}

i∈V

(1 − v

(a) + v

−i

, a

= 0),

and p

= v



= 1, a

−i



− I



= 0, a

−i



, if a

= 1 and p

= 0 if a

= 0. First note that

∈ A(p

). This is because the payoff of user i when a

= 1 is p

− v

) = −v

−i

, a

= 0).

If user i deviates and chooses not to share, her payoff would remain unchanged. However, when

= 0, her payoff is −v

−i

, a

= 0), and deviation to sharing would lead to the lower payoff

of −v

−i

, a

= 1). Therefore, faced with the price vector offer of p

, the users do not have a

proﬁtable deviation from a

We next show that for any p and a ∈ A(p), we have U (a

, p

) ≥ U(a, p). Since a is a user

equilibrium for the price vector p, i.e., a ∈ A(p), for all i such that a

= 1, we must have p

≥

= 1, a

−i

) − I

= 0, a

−i

)). This is because if p

< v

= 1, a

−i

) − I

= 0, a

−i

)),

then user i would have a proﬁtable deviation to not share her data. Thus,

U(a, p) =

i∈V

(a) −

i∈V: a

≤

i∈V

(a) −

i∈V: a

= 1, a

−i

) − I

= 0, a

−i

))

i∈V

(1 − v

(a) + v

−i

, a

= 0) ≤ U(a

, p

). 

Proof of Theorem 2

We use the following lemmas in this proof.

Lemma A-1 (Horn and Johnson [1987] Section 0.7). • The inverse of a matrix in terms of its blocks

A B

C D

−1

(A − BD

−1

−A

−1

B(D − CA

−1

−D

−1

C(A − BD

−1

(D − CA

−1

• Sherman-Morrison-Woodbury formula for the inverse of rank one perturbation of matrix: Suppose

A ∈ R

n×n

is an invertible square matrix and u, v ∈ R

are column vectors. Then A + uv

invertible if and only if 1 + v

−1

u 6= 0. If A + uv

is invertible, then its inverse is

(A + uv

)

−1

= A

−1

−

−1

1 + v

−1

Lemma A-2 (Feller [2008]Chapter 5, Theorem 5). Suppose (X

, . . . , X

) has a normal distribution

with covariance matrix Σ. The conditional distribution of X

given X

, . . . , X

is normal with covariance

matrix Σ

− d

−1

d, where D is the matrix obtained from Σ by removing the ﬁrst row and the ﬁrst

column and d = (Σ

, . . . , Σ

)

We now proceed with the proof of theorem. We ﬁrst prove the existence of p

. Let p

= 1, a

−i

) − I

= 0, a

−i

)). For any price vector p such that a ∈ A(p) we have

= 1, a

−i

) = p

− v

= 1, a

−i

) ≥ u

= 0, a

−i

) = −v

= 0, a

−i

), for all i s.t. a

= 1.

Rearranging this inequality leads to p

≥ v

= 1, a

−i

) − I

= 0, a

−i

)) = p

. We next ﬁnd

the price vector p

in terms of the matrix Σ. Let S ⊆ {1, . . . , n} be the set of users who have shared

their data. Leaked information about any user i is only a function of the correlation among users in

S and the correlation between user i and the users in S. The relevant covariance matrix for ﬁnding

leaked information about user i is given by the rows and columns of the matrix Σ corresponding

to users in S ∪ {i}. Therefore, without loss of generality, we suppose that i = 1 and all users have

shared their data and work with the entire matrix Σ. We ﬁnd the equilibrium price for user 1 (the

price offered to other users can be obtained similarly). With a

= . . . , a

= 1, (X

, S

, . . . , S

) is

normally distributed with covariance matrix







. . . Σ

1 + σ

. . . Σ

1 + σ

. . . Σ

. . . 1 + σ







Therefore, using Lemma A-2, the conditional distribution of X

given s

, . . . , s

is normal with

variance σ

−(σ

, Σ

, . . . , Σ

)(I+Σ)

−1

(σ

, Σ

, . . . , Σ

)

. The best estimator of X

given s

, . . . , s

is its mean which leads to the following leaked information

= 1, a

−1

) = (σ

, Σ

, . . . , Σ

)(I + Σ)

−1

(σ

, Σ

, . . . , Σ

)

. (A-5)

If user 1 deviates to a

= 0, then (X

, S

, . . . , S

) has a normal distribution with covariance







. . . Σ

. I + D

. . .







where D is obtained from Σ by removing the ﬁrst row and column. Therefore, using Lemma A-2,

the conditional distribution of X

given s

, . . . , s

is normal with variance σ

− (Σ

, . . . , Σ

)(I +

−1

(Σ

, . . . , Σ

)

and leaked information of user 1 is

= 0, a

−1

) = (Σ

, . . . , Σ

)(I + D)

−1

(Σ

, . . . , Σ

)

. (A-6)

Using A-5 and A-6, the price offered to user 1 must satisfy

= (σ

, d

)

+ 1 d

d (I + D)

−1

(σ

, d

)−

(I + D)

−1

d, where d = (Σ

, . . . , Σ

). We next simplify the right-hand side of the above equa-

tion. Using part 1 of Lemma A-1,

(σ

, d

)

+ 1 d

d I + D

−1

(σ

, d

) − d

(I + D)

−1

d = (σ

, d

)

M(σ

, d

) − d

(I + D)

−1

with

M =







(σ

+ 1) − d

(I + D)

−1



−1

−



(I + D) −

1+σ



−1

−(I + D)

−1



(σ

+ 1) − d

(I + D)

−1



−1



(I + D) −

1+σ



−1





Using part 2 of Lemma A-1, we can further simplify this equation as follows:



(σ

+ 1) − d

(I + D)

−1



−1

− σ

+ 1



(I + D) −

+ 1



−1

− d

(I + D)

−1



(σ

+ 1) − d

(I + D)

−1



−1

+ d



(I + D) −

+ 1



−1

− d

(I + D)

−1

d = σ



(σ

+ 1) − d

(I + D)

−1



−1

−





+ 1

(I + D)

−1

d + σ

+ 1

(I + D)

−1

(I + D)

−1

1 − d

(I + D)

−1





− d

(I + D)

−1



(σ

+ 1) − d

(I + D)

−1



−1





(I + D)

−1

d +

(I + D)

−1

(I + D)

−1

1 − d

(I + D)

−1





− d

(I + D)

−1

(σ

+ 1) − I

= 0, a

−1

)

−

= 0, a

−1

)

+ 1

−

+ 1

= 0, a

−1

)

(σ

+ 1) − I

= 0, a

−1

)

−

= 0, a

−1

)σ

(σ

+ 1) − I

= 0, a

−1

)

+ I

= 0, a

−1

) +

= 0, a

−1

)

(σ

+ 1) − I

= 0, a

−1

)

− I

= 0, a

−1

)



− I

= 0, a

−1

)



(σ

+ 1) − I

= 0, a

−1

)

where we used I

= 0, a

−1

) = d

(I + D)

−1

d. This also implies

= 1, a

−1

) = I

= 0, a

−1

) +



− I

= 0, a

−1

)





+ 1



− I

= 0, a

−1

)

. 

Proof of Corollary 1

Using Theorem 2, we have p

=1,a

−i

)

= v

(σ

−I

=0,a

−i

))

(σ

+1)−I

=0,a

−i

)

, which is increasing in σ

and de-

creasing in I

= 0, a

−i

). Again, using Theorem 2, we have I

= 1, a

−i

) = I

= 0, a

−i

) +

(σ

−I

=0,a

−i

))

(σ

+1)−I

=0,a

−i

)

, which is increasing in both I

= 0, a

−i

) and σ

. 

Proof of Proposition 2

We show the proof in two steps. In the ﬁrst step we show p

is decreasing in |Cov(X

, X

| a)|,

denoted by

for any j and in the second step we show p

is decreasing in |

| for any j and k.

Step 1: Without loss of generality suppose i = 1, j = 2 and the rest of the users share their data.

Note that conditional on a

−{1,2}

, (X

, S

) has a normal distribution. Let







˜σ

+ 1

˜σ

+ 1







denote its covariance matrix where E[X

| a

−{1,2}

] = ˜σ

, E[X

| a

−{1,2}

] =

, and E[X

−{1,2}

] = ˜σ

. Using Theorem 2, the price offered to user 1 is



˜σ



˜σ

+ 1

˜σ

+ 1

−1

˜σ

˜ρ

−

˜σ

+ 1



− ˜σ

(1 + ˜σ

)



(1 + ˜σ

)



(1 + ˜σ

)(1 + ˜σ

) −



Taking derivative with respect to

establishes that p

is decreasing in

and therefore decreas-

ing in |

Step 2: Without loss of generality suppose i = 1, j = 2, k = 3 and the rest of the users share

their data. Again, conditional on a

−{1,2,3}

, (X

, S

) has a normal distribution with covariance

matrix







˜σ

+ 1

˜σ

+ 1







where E[X

| a

−{1,2,3}

] = ˜σ

, E[X

| a

−{1,2,3}

] = ˜σ

, E[X

| a

−{1,2,3}

] = ˜σ

, E[X

| a

−{1,2,3}

] =

, E[X

| a

−{1,2,3}

] =

, and E[X

| a

−{1,2,3}

] =

. Using Theorem 2 and Corollary

1, the price offered to user 1 is increasing in I

= 0, a

−1

). Therefore, in order to complete the

proof, we need to prove that I

= 0, a

−1

) is increasing in |Σ

|. We have

= 0, a

−1

) =





˜σ

+ 1

˜σ

+ 1

−1

Taking the derivative of I

= 0, a

−1

) with respect to

yields

−2



(1 + ˜σ

)(1 + ˜σ

) −





−

(1 + ˜σ

)



−

(1 + ˜σ

)



We claim that this derivative is nonnegative for

≥ 0 and is nonpositive for

≤ 0, establish-

ing that leaked information is increasing in |

• Suppose

≥ 0. The derivative of I

= 0, a

−1

) with respect to

is nonnegative if

and only if



−

(1 + ˜σ

)



−

(1 + ˜σ

)



≤ 0. This inequality in turn is

equivalent to





≤ 0, where

P =

−

(1 + ˜σ

) (1 + ˜σ

)(1 + ˜σ

)

−

(1 + ˜σ

)

. (A-7)

The proof of this case completes by noting that the matrix P in this case is negative semi-

deﬁnite with nonpositive eigenvalues 0 and −

(2 + ˜σ

+ ˜σ

• Suppose

≤ 0. The derivative of I

= 0, a

−1

) with respect to

is nonpositive if

and only if



−

(1 + ˜σ

)



−

(1 + ˜σ

)



≥ 0. This inequality in turn is

equivalent to





≥ 0. The proof is completed by noting that the matrix

P (deﬁned in A-7) in this case is positive semi-deﬁnite with nonnegative eigen values 0 and

−

(2 + ˜σ

+ ˜σ

). 

Proof of Proposition 3

Let i ∈ V be such that a

= a

= 1. Using Theorem 2, we have

= v

= 1, a

−i

) − I

= 0, a

−i

))

(a)

≥ v



= 1, a

−i

) − I

= 0, a

−i

)



= p

where (a) follows from submodularity of leaked information, i.e., part 2 of Lemma 1. 

Proof of Lemma 3

Suppose to obtain a contradiction that in equilibrium a

= 0 for some i ∈ V with v

≤ 1. We prove

that there exists a deviation which increases the platform’s payoff. In particular, the platform can

deviate and offer price p

= v



= 1, a

−i

) − I

= 0, a

−i

)



so that user i shares.

From Theorem 1, the equilibrium action proﬁle a

must maximize

i∈V

(1−v

(a)+v

0, a

−i

). We show that (a

= 1, a

−i

) increases this objective, which yields a contradiction:





j∈V\{i}

= 1, a

−i

) − I

= 0, a

−i

)





−







j∈V: a

=1,a

−i

)

− p

=0,a

−i

)









(1 − v

= 1, a

−i

) + v

= 0, a

−i

)



− I

= 0, a

−i

)

(a)

≥ −







j∈V: a

=1,a

−i

)

− p

=0,a

−i

)









(1 − v

= 1, a

−i

) + v

= 0, a

−i

)



− I

= 0, a

−i

)

(b)

≥ (1 − v

)



= 1, a

−i

) − I

= 0, a

−i

)



(c)

≥ 0,

where (a) follows from monotonicity of leaked information (i.e., part 1 of Lemma 1), (b) follows

from Proposition 3, and (c) follows from the fact that v

≤ 1 and leaked information is monotone.

This shows that for any i such that v

≤ 1 we must have a

= 1. 

Proof of Theorem 3

We use the following notation in this proof. For any action proﬁle a ∈ {0, 1}

and any subset

T ⊆ {1, . . . , n}, we let a

denote a vector that include all the entries of a

for which i ∈ T .

Part 1: For a given action proﬁle a, the social surplus can be written as

Social surplus(a) =

i∈V

(1 − v

(a)

i∈V

(l)

(1 − v

(l)

, a

(h)

= 0) +

i∈V

(h)

(1 − v

, a

−i

= 0)

(b)

≤

i∈V

(l)

(1 − v

(l)

where (a) follows from the fact that the data of high-value users are not correlated with the data

of any other user, and (b) follows from the fact that for i ∈ V

(l)

, leaked information about user

i (weakly) increases in the set of users who share (from part 1 of Lemma 1) and 1 − v

≥ 0.

Conversely, for i ∈ V

(h)

we have 1 − v

< 0. This implies a

= 1 if and only if i ∈ V

(l)

The payoff of the platform for a given action proﬁle a (and the corresponding equilibrium

prices to sustain it) can be written as

U(a, p

) =

i∈V

(1 − v

(a) + v

= 0, a

−i

)

(a)

i∈V

(l)

(1 − v

(l)

) + v

(l)

\{i}

) +

i∈V

(h)

(1 − v

, a

−i

= 0)

(b)

≤

i∈V

(l)

(1 − v

(l)

) + v

(l)

\{i}

)

(c)

≤

i∈V

(l)

(1 − v

(l)

) + v

(l)

\ {i})

where (a) follows from the fact that the data of high-value users are not correlated with the data

of any other user, (b) follows from the fact that 1 − v

< 0 for i ∈ V

(h)

, and (c) follows from Lemma

3. Therefore, no high-value user shares in equilibrium and we have a

= a

Part 2: Let i ∈ V

(l)

and j ∈ V

(h)

be such that Σ

> 0. Therefore, there exists δ > 0 such that

(l)

) = δ > 0. We next show that for v

> 1 +

i∈V

(l)

the surplus of the action proﬁle a

negative, establishing that it does not coincide with the ﬁrst best. We have

Social surplus(a

) =

i∈V

(l)

(1 − v

) +

i∈V

(h)

(1 − v

)

(a)

≤

i∈V

(l)

(1 − v

)σ

i∈V

(h)

(1 − v

)

(b)

≤





i∈V

(l)

(1 − v

)σ





+ (1 − v

(l)

) ≤





i∈V

(l)





+ (1 − v

(l)

)

(c)

< 0

where in (a) for low-value users we have upper bounded leaked information with its maximum;

in (b) we have removed all the negative terms in the second summation except for the one corre-

sponding to j for which we have replaced the leaked information (of equilibrium action proﬁle)

by its minimum (using Lemma 3); and in (c) we have used v

> 1 +

i∈V

(l)

Part 3: Let i, k ∈ V

(h)

be such that Σ

> 0. The ﬁrst best involves all low-values are sharing their

data and none of the high-value users doing so. We next show that if the value of privacy for high-

value user i is small enough, then at least one high-value user shares in equilibrium. We show this

by assuming the contrary and then reaching a contradiction. Suppose that none of high-value

users share. We show that if user i shares, the platform’s payoff increases. We let a

denote the

sharing proﬁle in which all users in V

(l)

∪ {i} share their data and a ∈ {0, 1}

denote the sharing

proﬁle in which all users in V

(l)

share their data. Using this notation, let us write

U(a

, p

) = (1 − v

(l)

∪ {i}) + v

(l)

) +

k∈V

(h)

\{i}

(l)

∪ {i})





j∈V

(l)

(1 − v

(l)

∪ {i}) + v

(l)

∪ {i} \ {j})





(a)

= (1 − v

(l)

∪ {i}) + v

(l)

) +





k∈V

(h)

\{i}

(l)

∪ {i})





+ U(a, p

)

(b)

> U (a, p

where (a) follows from the fact that high- and low-value users are uncorrelated and (b) follows by

letting

(l)

∪ {i}) +

k∈V

(h)

\{i}

(l)

∪ {i})

(l)

∪ {i}) − I

(l)

)

({i}) +

k∈V

(h)

\{i}

({i})

Finally, note that using Σ

> 0, the right-hand side of the above inequality is strictly larger than

1. The proof is completed by letting ¯v

({i})+

k∈V

(h)

\{i}

({i})

. 

Proof of Proposition 4

For an equilibrium action proﬁle a

, social surplus can be written as

Social surplus(a

) =

i∈V

(1 − v

)

(a)

≤

i∈V

(l)

(1 − v

(V) +

i∈V

(h)

(1 − v

(l)

)

where (a) follows from the fact that for i ∈ V

(l)

, leaked information about user i increases in the

set of users who share (i.e., part 1 of Lemma 1) and 1 − v

≥ 0; and for i ∈ V

(h)

we have 1 − v

< 0

and I

) ≥ I

(l)

) by using Lemma 3. 

Proof of Proposition 5

Using Theorem 2, leaked information about a high-value user i ∈ V

(h)

if low-value users share is

(Σ

, . . . , Σ

)((I + Σ) + M )

−1

(Σ

, . . . , Σ

)

where low-value users are denoted by j

, . . . , j

and the diagonal entries of M are zero and M

r,s

the covariance between two low-value users r and s. We next prove that this leaked information

is larger than or equal to

l=1

||Σ||

, where ||Σ||

= max

i=1,...,n

j=1

|Σ

|. We ﬁrst show that

((I +Σ)+M)

−1

−((||Σ||

+ 1)I)

−1

 0 (i.e., the matrix ((I +Σ)+M)

−1

−((||Σ||

+ 1)I)

−1

is positive

semideﬁnite). Letting µ

denote an eigen value of the matrix ((I + Σ) + M)

−1

− ((||Σ||

+ 1)I)

−1

it sufﬁces to show that µ

≥ 0. There exists an eigenvalue, λ

, of the matrix (I + Σ) + M for which

we have µ

−

||Σ||

. We next show that all eigenvalues of the matrix (I + Σ) + M are

(weakly) smaller than ||Σ||

+ 1, which establishes that µ

≥ 0. Using Gershgorin Circle Theorem,

the matrix (||Σ||

+ 1)I − ((I + Σ) + M) is positive semideﬁnite. This is because for row i of this

matrix, the diagonal entry is ||Σ||

− Σ

which is larger than the summation of the absolute values

of the off-diagonal entries

j6=i

. Therefore, for any eigenvalue of the matrix (I + Σ) + M such

as λ

, we have λ

≤ ||Σ||

+ 1. We can write

(Σ

, . . . , Σ

)((I + Σ) + M )

−1

(Σ

, . . . , Σ

)

≥ (Σ

, . . . , Σ

)((||Σ||

+ 1)I)

−1

(Σ

, . . . , Σ

)

l=1

||Σ||

+ 1

. (A-8)

Using Proposition 4, equilibrium surplus is negative if

i∈V

(h)

− 1)I

(l)

) >

i∈V

(l)

(1 −

(V). From inequality (A-8) and I

(V) ≤ 1, this condition holds provided that

i∈V

(h)

− 1)

j∈V

(l)

||Σ

(l)

|| + 1

i∈V

(l)

(1 − v

where Σ

(l)

denotes the submatrix of Σ which only includes the rows and columns corresponding

to low value users. This completes the proof. 

References

A. Acquisti and H. R. Varian. Conditioning prices on purchase history. Marketing Science, 24(3):

367–381, 2005.

A. Acquisti, C. Taylor, and L. Wagman. The economics of privacy. Journal of Economic Literature,

54(2):442–92, 2016.

A. R. Admati and P. Pﬂeiderer. A monopolistic market for information. Journal of Economic Theory,

39(2):400–438, 1986.

A. Agarwal, M. Dahleh, and T. Sarkar. A marketplace for data: an algorithmic solution. Proceedings

of the 2019 ACM Conference on Economics and Computation, pages 701–726, 2019.

A. Agrawal, J. Gans, and A. Goldfarb. Prediction machines: the simple economics of artiﬁcial intelli-

gence. Harvard Business Press, 2018.

J. J. Anton and D. A. Yao. The sale of ideas: Strategic disclosure, property rights, and contracting.

The Review of Economic Studies, 69(3):513–531, 2002.

S. Athey, C. Catalini, and C. Tucker. The digital privacy paradox: Small money, small costs, small

talk. Technical report, National Bureau of Economic Research, 2017.

M. Babaioff, R. Kleinberg, and R. Paes Leme. Optimal mechanisms for selling information. In

Proceedings of the 13th ACM Conference on Electronic Commerce, pages 92–109, 2012.

J. Begenau, M. Farboodi, and L. Veldkamp. Big data in ﬁnance and the growth of large ﬁrms.

Journal of Monetary Economics, 97:71–87, 2018.

D. Bergemann and A. Bonatti. Selling cookies. American Economic Journal: Microeconomics, 7(3):

259–94, 2015.

D. Bergemann and A. Bonatti. Markets for information: An introduction. Annual Review of Eco-

nomics, 11, 2019.

D. Bergemann, A. Bonatti, and A. Smolin. The design and price of information. American Economic

Review, 108(1):1–48, 2018.

D. Bergemann, A. Bonatti, and T. Gan-a. The Economics of Social Data. Cowles Foundation Discus-

sion Paper, 2019.

M. Burkschat and N. Torrado. On the reversed hazard rate of sequential order statistics. Statistics

& Probability Letters, 85:106–113, 2014.

Y. Chen and S. Zheng. Prior-free data acquisition for accurate statistical estimation. Proceedings of

the 2019 ACM Conference on Economics and Computation, pages 659–677, 2019.

Y. Chen, N. Immorlica, B. Lucier, V. Syrgkanis, and J. Ziani. Optimal data acquisition for statistical

estimation. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 27–44,

2018.

J. P. Choi, D. S. Jeon, and B. C. Kim. Privacy and personal data collection with information exter-

nalities. Journal of Public Economics, 173:113–124, 2019.

E. H. Clarke. Multipart pricing of public goods. Public choice, 11(1):17–33, 1971.

P. Dasgupta and E. Maskin. The existence of equilibrium in discontinuous economic games, I:

Theory. Review of Economic Studies, 53(1):1–26, 1986.

T. Cover and J. Cover. Elements of Information Theory. John Wiley & Sons, 2012.

C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and

Trends

 in Theoretical Computer Science, 9(3-4):211–407, 2014.

K. Eliaz, R. Eilat and X. Mu. Optimal Privacy-Constrained Mechanisms. CEPR Discussion Paper

No. DP13536, 2019.

P. Es

o and B. Szentes. Optimal information disclosure in auctions and the handicap auction. The

Review of Economic Studies, 74(3):705–731, 2007.

J. A. Fairﬁeld and C. Engel. Privacy as a public good. Duke Law Journal, 65(3):385–457, 2015.

W. Feller. An introduction to probability theory and its applications, volume 2. John Wiley & Sons,

2008.

A. Goldfarb and C. Tucker. Online display advertising: Targeting and obtrusiveness. Marketing

Science, 30(3):389–404, 2011.

A. Goldfarb and C. Tucker. Shifts in privacy concerns. American Economic Review, 102(3):349–53,

2012.

A. Ghosh and A. Roth. Selling privacy at auction. Games and Economic Behavior, 91:334–346, 2015.

T. Groves. Incentives in teams. Econometrica: Journal of the Econometric Society, 41(4):617–631, 1973.

R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge, Cambridge, 1987.

J. Horner and A. Skrzypacz. Selling information. Journal of Political Economy, 124(6):1515–1562,

2016.

C. I. Jones and C. Tonettil. Nonrivalry and the economics of data. In 2018 Meeting Papers, number

477. Society for Economic Dynamics, 2018.

K. C. Laudon. Markets and privacy. Communications of the ACM, 39(9):92–104, 1996.

M. MacCarthy. New directions in privacy: Disclosure, unfairness and externalities. I/S: A Journal

of Law and Policy for the Information Society, 6:425–512, 2011.

R. Montes, W. Sand-Zantman, and T. Valletti. The value of personal information in online markets

with endogenous privacy. Management Science, 65(3):1342–1362, 2018.

F. Pasquale. The black box society. Harvard University Press, 2015.

E. A. Posner and E. G. Weyl. Radical markets: Uprooting capitalism and democracy for a just society.

Princeton University Press, 2018.

R. A. Posner. The economics of privacy. The American economic review, 71(2):405–409, 1981.

G. J. Stigler. An introduction to privacy in economics and politics. The Journal of Legal Studies, 9(4):

623–644, 1980.

C. R. Taylor. Consumer privacy and the market for customer information. RAND Journal of Eco-

nomics, 35(4):631–650, 2004.

J. Tirole. Digital dystopia. 2019.

H. R. Varian. Economic aspects of personal privacy. In Internet policy and economics, pages 101–109,

2009.

L. Veldkamp. Data and the aggregate economy. 2019.

L. Veldkamp, M. Farboodi, R. Mihet, and T. Philippon. Big data and ﬁrm dynamics. 2019.

W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of ﬁnance,

16(1):8–37, 1961.

S. D. Warren and L. D. Brandeis. The right to privacy. Harvard law review, 4(5):193–220, 1890.

A. F. Westin. Privacy and freedom. Washington and Lee Law Review, 25(1):166, 1968.

S. Zuboff. The age of surveillance capitalism: The ﬁght for a human future at the new frontier of power.

Proﬁle Books, 2019.

Appendix B: Online Appendix for “Too Much Data: Prices and Inefﬁ-

ciencies in Data Markets”

B.1 Proof of Pure Strategy Existence and Inefﬁciency When v

≤ 1 for all users

We show that with competition if v

< 1, c

is equal to constant c, and 1 >

j=1

|Σ

| for all i ∈ V,

then a pure strategy equilibrium exists. Since all v

’s are below one, in the Stackelberg equilibrium

(in the second stage) all users share their data. We next show that the ﬁrst stage game has a pure

strategy equilibrium. In particular, we prove the game is a potential game. Suppose the set S of

users join the ﬁrst platform (and also share their data). The payoff of a user i ∈ S is

= p

− v

(S) + c = v

(S) − I

(S \ {i})) − v

(S) + c = −v

(S \ {i}) + c.

Therefore, if in equilibrium set S of users join the ﬁrst platform and the rest (i.e., the set S

) join

the second platform, then we must have

) ≥ I

(S \ {i}), for all i ∈ S, and I

(S) ≥ I

\ {i}), for all i ∈ S

This means that leaked information of a user on platform 1 (if she does not share) must be smaller

than her leaked information on platform 2 (if she joins that platform). That is, the information

leakage due to externality on the platform that a user joins is smaller than the other platform.

Before introducing the potential function, we introduce some additional notation. Consider a

graph with the set of nodes {1, . . . , n} and edge weights Σ

(with self-loops of weights σ

). A walk

is a ﬁnite sequence of edges which joins a sequence of vertices. A closed walk is a walk in which the

ﬁrst and the last vertices are the same. We deﬁne the weight of a walk as the product of the weights

of its edges. For any i ∈ V and a given set S ⊆ {1, . . . , n}, S

, and n

, n

∈ N we let W(i; n

, n

; S)

be the sum over the weight of all closed walks that start from node i, visits n

nodes in the set

S \ {i} and n

nodes in the set S

\ {i}.

We ﬁrst provide a reformulation of leaked information in terms of these objects.

Claim 1: For any node i ∈ S we have

(S \ {i}) =

∞

t=1

(−1)

t+1

W(i; t, 0; S).

Suppose, without loss of generality, S = {1, . . . , k}. We also let Σ denote the rows and columns of

the covariance matrix corresponding to the users in the set S. Leaked information of user i 6∈ S

as characterized in Theorem 2 is (Σ

, . . . , Σ

)(I + Σ)

−1

(Σ

, . . . , Σ

)

. Given 1 >

j=1

|Σ

|, we

have 1 >

j∈S

|Σ

| for all i ∈ S and Gershgorin Circle Theorem shows that ρ(Σ) < 1. Therefore,

the Taylor expansion of I + Σ is convergent and we can rewrite the above expression as

(Σ

, . . . , Σ

)



I − Σ + Σ

− Σ

+ . . .



(Σ

, . . . , Σ

)

B-1

j=1

−

∈S

− · · · =

∞

t=1

(−1)

t+1

W(i; t, 0; S).

Claim 2: For any set S and S

, we have

≥1

(−1)

W(i; n

, n

; S) =

≥1

(−1)

W(i; n

, n

; S

This holds true because by deﬁnition of the weight of walks we have W(i; n

, n

; S) = W(i; n

, n

; S

Claim 3: The following is a potential function of the ﬁrst stage game:

Ψ(S) =

i∈S

: n

≥1

(−1)

W(i; n

, n

; S) +

i∈S

: n

≥1

(−1)

W(i; n

, n

; S

Moreover, the following set S is the equilibrium of the ﬁrst stage game:

S ∈ argmax

Ψ(T ).

First note that there are ﬁnitely many sets T (i.e., 2

of them) and therefore the maximizer of Ψ(·)

exists. We next prove that the maximizer of Ψ(·) corresponds to a pure strategy equilibrium of the

ﬁrst stage game, i.e.,

) ≥ I

(S \ {i}), for all i ∈ S, and I

(S) ≥ I

\ {i}), for all i ∈ S

Let i ∈ S. Since S is the maximizer of Ψ(·), by moving i from S to S

we obtain Ψ(S \ {i}) ≤ Ψ(S).

We further have

Ψ(S \ {i})

(a)

= Ψ(S) −

: n

≥1

(−1)

W(i; n

, n

; S) +

: n

≥1

(−1)

W(i; n

, n

; S

)

(b)

= Ψ(S) −

≥1

(−1)

W(i; n

, 0; S) +

≥1

(−1)

W(i; 0, n

; S

)

(c)

= Ψ(S) − I

) + I

(S \ {i}) ≤ Ψ(S),

where (a) follows from the fact that by moving i from S to S

the terms W(j; n

, n

; S) for all other

j 6= i do not change, (b) follows from Claim 2, and (c) follows from Claim 1. Therefore, we have

) ≥ I

(S \ {i}). This completes the proof of Claim 2.

Using an identical argument shows that for all i ∈ S

, we have I

(S) ≥ I

\ {i}). This

completes the proof of existence of a pure strategy equilibrium. 

We next show that when all values are below 1, the equilibrium is always inefﬁcient. Suppose

S and S

are the equilibrium joining (and sharing) sets on platforms 1 and 2, respectively. The

social surplus becomes

i∈S

(1 − v

(S) +

i∈S

(1 − v

) ≤

i∈V

(1 − v

(V), where the

inequality follows from the monotonicity of leaked information (Lemma 1). Note that the right-

B-2

hand side is the equilibrium surplus when we have a single platform, completing the proof. 

B.2 Remaining Proofs

Proof of Lemma 4

The payoff of user i from joining platform k ∈ {1, 2} can be lower bounded by

) + p

− v

)

(a)

= c

) − v

−i

, a

= 0)

(b)

≥ c

({i}) − v

which is positive given Assumption 1. Equality (a) follows from the characterization of equilib-

rium prices given in Theorem 2, and inequality (b) follows from the monotonicity of the joining

value function c

(·) and the fact that maximum leaked information is σ

. 

Proof of Theorem 4

After the joining decisions are made, the equilibrium characterization of the sharing proﬁle and

the price vector on each platform is identical to that of Theorems 1 and 2. 

Proof of Theorem 5

The theorem follows from the fact by using a uniform mixed strategy, the two platforms generate

the same ex ante expected payoff for users, hence no user has an incentive to deviate from uniform

randomization. 

Proof of Theorem 6

Part 1: When high-value users are not correlated with other users, the ﬁrst best is to have all users

join one of the platforms and to have all low-value users share. This is because: (i) the high-value

users do not share because the data of high-value users do not increase leaked information of

low-value users and generate negative surplus, (ii) the low value users share because it generates

a positive surplus itself and also increases leaked information of other low-value users (and does

not leak information about high-value users), and (iii) all users join the same platform because it

increases the joining value as well as leaked information of low-value users. We next show that

this is an equilibrium if and only if

(V) − c

({i}) ≥ I

(l)

\ {i}) for all i ∈ V

(l)

. (B-1)

If inequality (B-1) holds, then the ﬁrst best is an equilibrium: First, note that high-value users

do not have an incentive to deviate to the other platform because they would gain c

({i}) instead

of c

(V). Second, consider i ∈ V

(l)

and suppose she deviates and joins the other platform. Her

payoff changes from c

(V) − v

(l)

\ {i}) to c

({i}). This is not a proﬁtable deviation when

inequality (B-1) holds.

B-3

If inequality (B-1) does not hold, then the ﬁrst best is not an equilibrium: Given inequality (B-1)

does not hold, there exists i ∈ V

(l)

such that c

(V)−c

({i}) < I

(l)

\{i}). We let v

(V)−c

({i})

(l)

\{i})

+

for  =



1 −

(V)−c

({i})

(l)

\{i})



, and show that all users joining the ﬁrst platform and low-value users

sharing on platform 1 is not an equilibrium. With this choice for , v

is less than 1 and strictly

larger than

(V)−c

({i})

(l)

\{i})

. In particular, we show that user i has a proﬁtable deviation by joining

platform 2. Using Theorem 4, the payoff of user i on platform 1 is −v

(l)

\ {i}) + c

(V). If

she deviates and joins platform 2, her payoff becomes (again using Theorem 4) c

({i}). This is a

proﬁtable deviation because

− v

(l)

\ {i}) + c

(V)

(a)

= −



(V) − c

({i})

(l)

\ {i})

+ 



(l)

\ {i}) + c

(V)

= − (c

(V) − c

({i})) − I

(l)

\ {i}) + c

(V) = −I

(l)

\ {i}) + c

({i}) < c

({i}).

where in (a) we substituted the value of v

Part 2: Let V

(l)

= {i ∈ V

(l)

: ∀j ∈ V

(h)

, Σ

= 0} be the set of low-value users whose correlation

with all high-value users are zero and V

(l)

= V

(l)

\ V

(l)

be the set of low-value users with at least

one non-zero correlation coefﬁcient to a high-value user. By assumption, there exists at least one

non-zero correlation coefﬁcient between high and low-value users, showing V

(l)

6= ∅. We next

show that there exist

v ∈ R

(h)

and v ∈ R

(l)

such that for v

(h)

≥

v and v

(l)

≥ v, the ﬁrst

best is to have all users joining the same platform and low-value users in V

(l)

share their data.

Suppose the contrary, i.e., a subset of users J

join the ﬁrst platform and users in S

⊆ J

their data, and the rest of the users J

= V \ J

join the second platform and users in S

⊆ J

on platform 2. For large enough

v on each platform, leaked information of high-value users must

be zero. Therefore, for the set of users sharing on platforms 1 and 2, we have S

, S

⊆ V

(l)

. For

sufﬁciently large v, the surplus is upper bounded as follows:

i∈J

(1 − v

) + c

) +

i∈J

(1 − v

) + c

)

(a)

≤

i∈V

(l)

(1 − v

(l)

) +

i∈J

) +

i∈J

)

(b)

≤

i∈V

(l)

(1 − v

(l)

) +

i∈V

(V),

where (a) follows from the fact that only low-value users have a non-zero leaked information

(because the value of high value users is large) and their leaked information is upper bounded

by their leaked information when all low-value users share, and (b) follows from the fact if all

low-value users are very close to 1, then the inequality in (b) holds because of the monotonicity of

, and therefore there exists v close enough to 1 for which (b) holds.

Now note that this cannot be an equilibrium because in equilibrium all low-value users share,

but in the ﬁrst best the low-value user who has a non-zero correlation coefﬁcient with one of the

high-value users should not be sharing (i.e., those users in V

(l)

6= ∅ should not be sharing).

Part 3: Let users i, k ∈ V

(h)

be such that Σ

> 0. The ﬁrst best is to have all users joining the same

B-4

platform, which we assume without loss of generality is platform 1, all low-value users share and

none of the high-value users share. This is because high-value users have no externality on low-

value users and therefore they should never share in the ﬁrst best. They join the same platform as

low-value users to enjoy the higher joining value. Also, all low-value users join the same platform

and since they have a zero correlation coefﬁcient with high-value users, they should all share in

the ﬁrst best. An identical argument to that of Part 3 of Theorem 3 shows that for sufﬁciently small

value v

, user i shares on platform 1, establishing that the ﬁrst best cannot be equilibrium. 

Proof of Proposition 6

The proof is very similar to the proof of Proposition 4. In particular, similar to Lemma 3, for

any joining proﬁle in the Stackelberg equilibrium users with values below one share. Similar to

Proposition 4, we can lower bound leaked information about high-value users such as i with their

leaked information when only low-value users shares; and upper bound the leaked information

of low-value users such as i with their leaked information when all users share. We next provide

the lower bound on the leaked information of a high-value user i:



J,E

)



(a)

−i



)



−i



)



(b)

≥

−i

(l)

∩ J

)

−i

(l)

∩ J

)

= E

−i



(l)

∩ J

) +



(l)



(l)

∩ J





(c)

≥ E

−i



(l)

)



(l)

where (a) follows from the fact that we have uniform mixed joining strategy, (b) follows from the

fact that on each platform all low-value users share their information, and (c) follows from the fact

that leaked information is submodular. 

Proof of Theorem 7

We ﬁrst state a generalization of our equilibrium existence result.

Theorem B-1. For any joining proﬁle b with the corresponding sets of users J

and J

, there exists a pure

strategy Stackelberg equilibrium with equilibrium prices as in equation (8) and equilibrium payoffs of users

joining platform k given by

, a

, p

) = −v

= 0, a

−i

) − v

αI

) + c

). (B-2)

Moreover, there always exists a mixed strategy joining equilibrium in which all users join each platform

with probability 1/2.

The proof of this theorem is straightforward and is omitted. Given this existence and charac-

terization result (where the price vector is identical to that in Theorem 4), the proof of Theorem 7

follows closely that of Theorem 6, and is omitted as well.

B-5

Proof of Proposition 7

We let J

and J

be the set of users joining platforms 1 and 2, respectively and S

and S

be the

set of users sharing on platforms 1 and 2 respectively. The sum of the platform’s utilities can be

written as

i∈J

) + αI

) − p

i∈J

) + αI

) − p

. (B-3)

Also, the summation of users’ utilities (excluding the joining value) can be written as

i∈J

− v

) − v

αI

) +

i∈J

− v

) − v

αI

) − p

. (B-4)

Taking summation of equations (B-3) and (B-4), the data social surplus becomes

i∈J

(1 − v

)(I

) + αI

)) +

i∈J

(1 − v

)(I

) + αI

)).

The rest of the proof is identical to that of Proposition 6 with the only difference that for a low-

value user i ∈ V that joins platform k ∈ {1, 2}, we use the bound I

) + αI

) ≤ (1 +

α)I

(V), and for high-value user i we use the bound



) + αI

)



(a)

= (1 + α)



−i



)



−i



)





(b)

≥

1 + α



−i

(l)

∩ J

)

+ E

−i

(l)

∩ J

)

i

1 + α



−i

(l)

∩ J

) + I



(l)



(l)

∩ J

i

(c)

≥ (1 + α)E

−i



(l)

)



1 + α

(l)

where (a) follows from the fact that we have uniform mixed joining strategy, (b) follows from the

fact that on each platform all low-value users share their information, and (c) follows from the fact

that leaked information is submodular. 

Existence of Mixed Strategy Equilibria when Platforms Compete over Data Prices

Our main existence result in the case of competition over data prices is:

Theorem B-2. There exists a mixed strategy equilibrium strategy.

This theorem follows because for any price vector p

and p

, the second-stage game is a ﬁnite

game and therefore has a mixed strategy equilibrium. If there are multiple equilibria, we select

the one with the highest sum of platform’s utilities. We next establish the existence of the mixed

strategy equilibirium in the ﬁrst-stage game by using Dasgupta-Maskin theorem stated below.

Theorem B-3 (Dasgupta and Maskin [1986]). Consider a game with n players where the action space of

user i is denoted by a bounded set S

and her payoff is denoted by u

. If

B-6

1. for any i, u

is continuous except on a subset of S

∗

(i), where S

∗

(i) = {s ∈ S : ∃j 6= i such that s

)}, for bijective and continuous functions f

: S

→ S

for d = 1, . . . , D(i).

i=1

(s) is upper semicontinuous, i.e., for any sequence s

→ s, we have

i=1

(s) ≥ lim sup

k→∞

i=1

3. for any i, u

, s

−i

) is weakly lower semicontinuous, i.e., there exists λ ∈ [0, 1] such that λ lim inf

↑s

, s

−i

(1 − λ) lim inf

↓s

, s

−i

) ≥ u

, s

−i

Then a mixed strategy equilibrium exists.

We next show that the conditions of this theorem are satisﬁed, establishing a mixed strategy

equilibrium exists. First, note that the price each platform offers to any user cannot exceed the

highest overall leaked information, i.e.,

i∈V

(V). Therefore, without loss of generality, we as-

sume the action space of both platforms is [0,

i∈V

(V)]

For two vector of prices p

and p

and user i ∈ V we deﬁne functions f

: p

→ p

such that

)]

= p

− v

) + c

) + v

) − c

), ∀S

, S

, J

⊆ V.

Note that there are ﬁnitely many such functions and in particular at most n

×2

of them (this

is because there are n components and for each of them J

has 2

possibilities, J

= V\J

, and each

of S

and S

have 2

possibilities). Also, note that the functions f

are all linear and hence bijective

and continuous. By changing the prices p

and p

, as long as user equilibria of the second-stage

game are the same, the payoff functions remain continuous. It becomes discontinuous when a

user i ∈ V who is sharing on platform 1 changes her decision and starts sharing on platform 2.

For this to happen we must have p

− v

) + c

) = p

− v

) + c

), where S

is the set

of users who are sharing on platform 1, S

is the set of users who are sharing on platform 2, and

and J

are the sets of users who are joining platforms 1 and 2, respectively. Therefore, for any

discontinuity point of U

, p

), there exists f

such that p

= f

). This establishes that the

ﬁrst condition of Theorem B-3 holds.

The second condition of Theorem B-3 holds because as long as user equilibria of the second-

stage game remains the same, payoff functions are continuous in the ﬁrst stage prices. When user

equilibria changes, we select the one with the highest sum of the platforms’ utilities. This implies

that the sum of platforms’ utilities is an upper semicontinuous function in prices.

The third condition of Theorem B-3 holds because by changing p

, as long as the equilibrium

of the second-stage game has not changed, the payoff of platform 1 is continuous. At the point

that the equilibrium changes, we have multiplicity of equilibria and we have chosen the one that

gives maximum payoff of platforms. Therefore, we have lim inf

↓p

, p

) = U

, p

which by deﬁnition is weakly lower semicontinuous with the choice of λ = 0. 

Proof of Theorem 8

Part 1: In this case, there is no externality among users, and both the ﬁrst best and the equilibrium

involve all users joining platform 1 (or platform 2) and all low-value users sharing their data.

B-7

In particular, we show that the following prices with a user equilibrium in which all users join

platform 1 and all low-value users share on platform 1 is an equilibrium. For all i ∈ V

(l)

we let

1,E







({i}), c

(V) − c

({i}) ≥ (1 − v

({i}),

({i}) − (c

(V) − c

({i})) , c

(V) − c

({i}) < (1 − v

({i}),

and

2,E







({i}), c

(V) − c

({i}) ≥ (1 − v

({i}),

({i}), c

(V) − c

({i}) < (1 − v

({i}).

This is a user equilibrium because the payoff of a user on platform 1 is c

(V) that is larger than

her payoff on platform 2 which is c

({i}). We next show that platform 1 does not have a proﬁtable

deviation. For any user i for which c

(V) − c

({i}) ≥ (1 − v

({i}) platform 1 cannot increase

its payoff by reducing its price offer because the user would then stop sharing her data. For any

user with c

(V) − c

({i}) < (1 − v

({i}), a lower price would make the user join platform 2. This

establishes that the ﬁrst platform does not have a proﬁtable deviation. We next show that platform

2 does not have a proﬁtable deviation. The maximum price that platform 2 can offer to user i

without making negative proﬁts is I

({i}) (this is because there exists no externality). Such a price

offer is not sufﬁcient to attract users from platform 1. In particular, if c

(V)−c

({i}) < (1−v

({i})

then the price I

({i}) offered to user i by the second platform would make her indifferent between

the two platforms and if c

(V) − c

({i}) ≥ (1 − v

({i}) then the price I

({i}) offered to user i by

the second platform would give her a lower payoff. Therefore, the second platform does not have

a proﬁtable deviation. This completes the proof of the ﬁrst part.

Part 2: Note that the ﬁrst best is to have all users join the same platform and low-value users share

on it, which we assume is platform 1. Since there is no externality from high-value users, without

loss of generality, we show the proof when they are removed from the market as they will join

(and not share) the platform that has a higher joining value for them.

Part 2-1: We show that if δ ≥ δ

for

δ = max

i∈V





j∈V

(V)





+ v

(V \ {i}) − I

({i})), (B-5)

the ﬁrst best is an equilibrium, supported by the following prices:

1,E

= v

(V) − I

(V \ {i})), i ∈ V (B-6)

2,E

= c

(V) − c

({i}) − v

(V \ {i}) + v

({i}), i ∈ V.

Note that all (low-value) users joining and sharing on platform 1 is a user equilibrium. This is

because the payoff of each user i is she deviates and shares on the other platform is equal to her

current payoff. Platform 1 does not have a proﬁtable deviation. This is because, using Theorem 2,

B-8

platform 1 is paying the minimum prices to get the data of all users. We next show that platform 2

does not have a proﬁtable deviation, establishing the prices speciﬁed above and all users sharing

on platform 1 is an equilibrium. The highest price that platform 2 can offer to get one of users, e.g.,

user i, share on it is bounded by the total information leakage

j∈V

(V). We show that given

the condition on δ all users joining and sharing on platform 1 is always a user equilibrium which

gives the second platform zero payoff. Consider user i ∈ V, given all other users are sharing on

platform 1 it is best response for this user to share on platform 1 because

1,E

− v

(V) + c

(V)

(a)

= −v

(V \ {i}) + c

(V)

(b)

≥ −v

(V \ {i}) + δ + c

({i})

(c)

≥

j∈V

(V) − v

({i}) + c

({i}) ≥ ˜p

− v

({i}) + c

({i}).

where (a) follows from plugging in the prices given in (B-6), (b) follows from the deﬁnition of δ,

and (c) follows from (B-5).

Part 2-2: We show that given ∆ ≤

∆ and v

≤ ¯v, where

∆ = max



max

i,S: i6∈S

(1 − v

(V) + v

(2I

({i}) − I

)), (B-7)

max

S⊆V

2|S|

i∈S

(V) − (1 − v

(S) − v

∪ {i})



. (B-8)

and

¯v = min



, min

i,S: i∈S,I

(V)−I

(S)6=0

(V) − I

(S)

(V) − I

(S) + I

∪ {i}) − I

({i})



. (B-9)

the ﬁrst best is an equilibrium.

Before we proceed with the proof of this case, note that both

∆ and ¯v are non-zero. The latter is

by hypothesis. Consider the former, (B-7). The ﬁrst term on the right-hand side of this expression

is non-zero if v

≤

. The second term on the right-hand side of this expression is non-zero,

because v

≤ 1 and leaked information is monotonically increasing in the set of users who share.

If I

(V) = I

(S) for some S, then user i is uncorrelated with users and the complement of S, S

If I

(V) = I

) for some S, then user i is uncorrelated with users in S. Therefore, for the right-

hand side of the above expression to be zero all users must be uncorrelated with all other users.

But this contradicts the assumption that at least the data of two low-value users are correlated.

We next show that the following prices form an equilibrium:

1,E

= I

(V) − (c

(V) − c

({i})), i ∈ V, (B-10)

2,E

= (1 − v

(V) + I

({i}), i ∈ V. (B-11)

Note that with these prices all users sharing on platform 1 is a user equilibrium. This is because if a

user deviates and shares on the second platform, she receives the same payoff. We next show that

B-9

the second platform does not have a proﬁtable deviation. Suppose the second platform deviates to

get users in a set S share on it by offering prices ˜p

. We show that the payoff of platform 2 becomes

strictly negative. Consider one of the user equilibria after this deviation and suppose that the set

⊆ S

of users join platform 1 and a subset of J

, S

, shares on platform 1. Users in S must prefer

to share on platform 2, which leads to ˜p

− v

(S) + c

(S) ≥ p

− v

∪ {i}) + c

∪ {i}) ≥

(V) − (c

(V) − c

({i})) − v

∪ {i}) + c

({i}). Hence

˜p

≥ v

(S) − c

(S) + I

(V) − (c

(V) − c

({i})) − v

∪ {i}) + c

({i})

(a)

≥ −2∆ + I

(V) + v

(S) − v

∪ {i}). (B-12)

Therefore, the payoff of platform 2 becomes

i∈S

(S) − ˜p

(a)

≤

i∈S

(S) + 2∆ − I

(V) − v

(S) + v

∪ {i})

= 2∆|S| +

i∈S

(1 − v

(S) + v

∪ {i}) − I

(V)

(b)

< 0,

where (a) follows by using (B-12) and (b) follows from the choice of

∆ given in (B-7).

We next show that platform 1 does not have a proﬁtable deviation. Suppose that platform

1 deviates to get users in the set S to share on it with prices ˜p

. We ﬁrst claim that if ∆ ≤

∆

(where

∆ is given in (B-7)), in one of the user equilibria all users prefer to share on platform 2. In

particular, for a user i ∈ S

, her payoff if she shares on platform 2 is higher than her payoff if she

joins platform 2 and does not share because

− v

) + c

)

(a)

= (1 − v

(V) + v

({i}) − v

) + c

)

(b)

≥ −v

\ {i}) + c

where (a) follows by using the prices given in (B-10) and (b) follows the submodularity of leaked

information and in particular v

({i}) ≥ v

) − I

\ {i})). Also, for a user i ∈ S

, her

payoff if she shares on platform 2 is higher than her payoff if she joins platform 1 because (she

does not share on platform 1 as the price offered to her is 0)

− v

) + c

)

(a)

= (1 − v

(V) + v

({i}) − v

) + c

)

(b)

≥ (1 − v

(V) + v

({i}) − v

) + c

(S ∪ {i}) − ∆

(c)

≥ −v

({i}) + c

(S ∪ {i}) ≥ −v

(S ∪ {i}) + c

(S ∪ {i})

where (a) follows by using the prices given in (B-10), (b) follows from the deﬁnition of ∆, (c)

follows from the choice of

∆ given in (B-7). To have users in the set S share on platform 1 the new

B-10

prices must satisfy

˜p

− v

(S) + c

(S) ≥ p

− v

∪ {i}) + c

∪ {i})

(a)

= (1 − v

(V) + v

({i}) − v

∪ {i}) + c

∪ {i}),

where (a) follows from substituting for prices from (B-10). Therefore,

˜p

≥ (1 − v

(V) + v

({i}) − v

∪ {i}) + v

(S) + c

∪ {i}) − c

(S)

≥ (1 − v

(V) + v

({i}) − v

∪ {i}) + v

(S) − ∆. (B-13)

We next show that this user equilibrium generates no higher payoff for platform 1 (compared to

its equilibrium payoff of ∆|V|). In particular, the payoff of platform 1 can be written as

i∈S

(S) − ˜p

(a)

≤

i∈S

(1 − v

(S) − (1 − v

(V) + v

∪ {i}) − v

({i}) + ∆

(b)

≤ ∆|S| ≤ ∆|V|

where in (a) we used the inequality (B-13), and (b) follows from the choice of ¯v given in (B-9). In

particular, using v

≤ ¯v, for any i and S such that I

(V) − I

(S) 6= 0, we have (1 − v

(S) − (1 −

(V) + v

∪ {i}) − v

({i}) ≤ 0. For any S and i ∈ S for which I

(V) − I

(S) = 0, user

i’s data must be independent of the data of users in S

. Therefore, we have I

∪ {i}) = I

({i}).

This leads to

(1 − v

(S) − (1 − v

(V) + v

∪ {i}) − v

({i}) = 0.

This completes the proof of this case.

Part 2-3: We show that if ∆ ≤

∆ where

∆ = min



min

i∈V

(1 − v

({i}), max

i∈V

i6=j

({i, j}) − I

({i})

2(3|V| − 2)



, (B-14)

then there exist

v such that for v

(l)

≥

v the equilibrium is inefﬁcient.

Before, we proceed with the proof, note that the second argument of maximum is non-zero

since there exits at least two low-value users whose data are correlated.

We show that there exists no prices for both platforms to sustain all (low-value) users share

on platform 1 as an equilibrium. We suppose the contrary and then reach a contradiction. In

particular, we let x

, . . . , x

and y

, . . . , y

be the equilibrium prices offered by platform 1 and 2.

Since all users sharing on platform 1 is a user equilibrium we must have x

− v

(V) + c

(V) ≥

− v

({i}) + c

({i}). Also, note that we must have x

− v

(V) ≤ c

(V) − c

({i}) + y

− v

({i}).

This is because, otherwise platform 1 can deviate by decreasing its prices and increase its payoff.

Therefore, we have

− v

(V) ∈ [−(c

(V) − c

({i})) + y

− I

({i}), (c

(V) − c

({i})) + y

− I

({i})] , i ∈ V. (B-15)

B-11

Moreover, we also have

− v

(V) + c

(V) ≥ (1 − v

({i}) + c

({i}), i ∈ V. (B-16)

This is because if this inequality does not hold for some i ∈ V, i.e.,  = ((1 − v

({i}) + c

({i})) −

−v

(V)+c

(V)) > 0, then platform 2 will have a proﬁtable deviation by letting y

= I

({i})−



for all i ∈ V. This is because in any user equilibria after this deviation at least one user shares on

this platform, guaranteeing a positive proﬁt.

We now consider the deviation of platform 1. For any j ∈ V, the following prices guarantee

that the only user equilibria is to have all users i in V \ {j} to share on platform 1 and user j share

on platform 2:

= c

(V) − c

({i}) + y

− v

({i, j}) + v

(V \ {j}) + ∆, i ∈ V \ {j},

This is a user equilibrium because for any user i ∈ V \ {j} the price offered to her with the

maximum leaked information on platform 1 and minimum joining value is larger than her payoff

with the price offered on platform 2 with minimum leaked information and maximum joining

value, i.e., we have

− v

(V \ {j}) + c

({i}) > y

− v

({i, j}) + c

(V).

Also, user j shares on platform 2 because we have

− v

({j}) + c

({j})

(a)

≥ x

− v

(V) − c

(V) + c

({j}) + c

({j})

(b)

≥ (1 − v

({j}) + 2(c

({j}) − c

(V)) + c

({j})

(c)

≥ (1 − v

({j}) − 2∆ + c

({j})

(d)

≥ c

({j})

where (a) follows from (B-15), (b) follows from (B-16), (c) follows from the deﬁnition of ∆, and (d)

follows from condition (B-14).

This should not be a proﬁtable deviation for platform 1, which leads to

i6=j

(V \ {j}) − x

i6=j

(V \ {j}) − c

(V) + c

({i}) − y

+ v

({i, j}) − v

(V \ {j})

(a)

i6=j

(V \ {j}) − 2c

(V) + 2c

({i}) − x

− ∆ + v

(V) − I

({i}) + I

({i, j}) − I

(V \ {j}))

(b)

≤

i∈V

(V) − x

where in (a) we used (B-15), and in (b) we used the fact that in the only user equilibrium after this

deviation the payoff of platform 1 cannot increase. Rearranging the previous inequality and using

B-12

the deﬁnition of ∆, for any j ∈ V we obtain

≤ I

(V) +

i6=j

(1 − v

) (I

(V) − I

(V \ {j})) + v

({i}) − I

({i, j})) + 3∆, j ∈ V. (B-17)

We now consider the deviation of platform 2. In particular, we show that for any j ∈ V, with

price y

= x

− v

(V) + c

(V) + v

({j}) − c

({j}) + ∆ and zero for all other users the only user

equilibrium is to have user j share on platform 2 and all other users share on platform 1. First note

that all other users will still share on platform 1 as their information leakage has weakly decreased

(since j is not sharing on platform 1) and they receive the same payment. now consider user j.

She shares her data on platform 2, because with the choice of the price y

we have y

− v

({j}) +

({j}) > x

− v

(V) + c

(V). This cannot be a proﬁtable deviation for platform 2 leading to

({j}) − y

= I

({j}) − x

+ v

(V) − v

({j}) ≤ 0. Therefore, we have

≥ (1 − v

({j}) + v

(V) − ∆. (B-18)

We next show that for sufﬁciently large v

(l)

given the choice of

∆, the inequalities (B-18), and

(B-17) cannot simultaneously hold. In particular, consider j ∈ V who is with at least one other

low-value user. For boundary low-values, i.e., v

= 1, we must have ∆(3|V| − 2) + I

(V) +

i6=j

({i}) − I

({i, j})) ≥ I

(V), which does not hold provided that ∆ <

3|V|−2

i6=j

({i, j})−

({i}). Since ∆ ≤

∆ for sufﬁciently large values v

(l)

, there exists no prices for which the ﬁrst best

can be sustained as an equilibrium, establishing inefﬁciency in this case.

Part 3-1: We ﬁrst show that there exists

v and v such that if v

(h)

≥

v, v

(l)

≥ v, and

δ >

|V|

i∈V

(l)

(1 − v

)(I

(l)

) − I

(l)

)), (B-19)

the ﬁrst best is to have all users joining the same platform, say platform 1, and only low-value

users who are not correlated with high-value users share their data. Let V

(l)

= {i ∈ V

(l)

: ∀j ∈

(h)

, Σ

= 0}, be those low-value users uncorrelated with all high-value users and V

(l)

= V \ V

(l)

to denote the rest of the low-value users (i.e., low-value users correlated with at least one high-

value user). The ﬁrst best is to have all users join one of the platforms, say platform 1, because we

can upper bound the social surplus when set J

(6= ∅ and 6= V) joins platform k ∈ {1, 2} and set S

share on it as

i∈J

(1 − v

) + c

) +

i∈J

(1 − v

) + c

)

(a)

≤ −|V|δ +

i∈V

(V) +

i∈J

(1 − v

) +

i∈J

(1 − v

)

(b)

≤ −|V|δ +

i∈V

(V) +

i∈J

∩V

(l)

(1 − v

∩ V

(l)

) +

i∈J

∩V

(l)

(1 − v

∩ V

(l)

)

B-13

≤ −|V|δ +

i∈V

(V) +

i∈V

(l)

(1 − v

(l)

)

(c)

≤

i∈V

(V) +

i∈V

(l)

(1 − v

(l)

where (a) follows from the deﬁnition of δ, (b) follows from the fact that for sufﬁciently large ¯v only

low-value users share and the information leakage of high-value users is zero, and (c) follows from

condition (B-19). We next consider two possible cases and show that in both of them platform 2

has a proﬁtable deviation.

Case 1: There exists no non-zero correlation between users in V

(l)

and users in V

(l)

. We show that

in this case platform 2 can induce all users in V

(l)

to join and share their data. In particular, the

following prices form a proﬁtable deviation for platform 2:

˜p

= ∆ + v

(l)

), i ∈ V

(l)

and zero price to all other users. We ﬁrst show that with these prices in any user equilibrium all

users in V

(l)

will join and share on platform 2. This is because the payoff of a user i ∈ V

(l)

after

deviating to share on platform 2 is

˜p

− v

∪ {i}) + c

∪ {i})

(a)

≥ ˜p

− v

(l)

) + c

∪ {i})

(b)

= ∆ + c

∪ {i})

(c)

≥ c

∪ {i})

where (a) follows from the fact that the price offered by platform 2 to users outside of V

(l)

is zero

and hence they never share on platform 2, (b) follows from the choice of price offered by platform

2, and (c) follows from the deﬁnition of ∆. We next show that if

i∈V

(l)

(1 − v

(l)

) > ∆|V

(l)

|, (B-20)

then the payoff of platform 2 after this deviation becomes strictly positive. Note that we can lower

bound the payoff of platform 2 by

i∈V

(l)

) − ˜p

i∈V

(l)

) − ∆ − v

(l)

) =

i∈V

(l)

(1 − v

(l)

) − ∆|V

(l)

| > 0.

Finally, we show that there exist

δ and

∆ such that for δ ≥

δ and ∆ ≤

∆ both (B-19) and (B-20)

hold. First note that since in this case users in V

(l)

and those in V

(l)

are uncorrelated, condition (B-

19) becomes δ >

|V|

i∈V

(l)

(1−v

(l)

). Therefore, letting ∆ <

∆ =

(l)

i∈V

(l)

(1−v

(l)

and δ >

δ =

(l)

i∈V

(l)

(1 − v

(l)

), completes the proof in this case.

Case 2: There exists j ∈ V

(l)

who is correlated with at least one other user in V

(l)

. We show that

platform 2 has a proﬁtable deviation to take user j join and share on it. In particular, the following

prices constitute a proﬁtable deviation for platform 2:

˜p

= c

(V) − c

({j}) + v

({j}) − v

(l)

B-14

and zero price to all other user. First note that in any user equilibrium user j shares on platform

2. This is because ˜p

− v

({j}) + c

∪ {j}) = c

(V) − c

({j}) − v

(l)

) +  + c

∪ {j}) ≥

(V) − v

(l)

). This deviation is proﬁtable for platform 2 provided that

(1 − v

({j}) + v

(l)

) > ∆. (B-21)

Note that in this case I

(l)

) > 0. Letting

∆ = I

(l)

) ≥ ∆ and

δ =

(l)

i∈V

(l)

(1 − v

(l)

) <

δ, guarantees that the ﬁrst best is not an equilibrium. The proof is completed by observing that for

(l)

sufﬁciently close to 1, we have

∆ >

δ.

Part 3-2: We prove that for

δ >

δ = max

i∈V

(V) +

j∈V

(V), (B-22)

the ﬁrst best is an equilibrium.

We ﬁrst show that when δ is large, the ﬁrst best involves all users joining the same platform,

say platform 1. This is proved by showing that if users split and join different platforms social

surplus decreases. Let J

6= ∅, V be the set of users who join platform 1 and J

be the set of users

who join platform 2. Also, let S

and S

be the sets of users who share on platforms 1 and 2,

respectively. We can upper bound the surplus as





i∈J

) + (1 − v

)









i∈J

) + (1 − v

)





(a)

≤

i∈V

(V) − δ

i∈J

(1 − v

) +

i∈J

(1 − v

)

(b)

≤ −δ|V| +

i∈V

(V) + I

(V)

(c)

i∈V

(V),

where (a) follows from the deﬁnition of δ and J

, J

6= ∅, (b) follows from replacing (1 − v

)

for k = 1, 2 by its upper bound I

(V), and (c) follows from the condition given in (B-22). We

next show that the ﬁrst best, a

, can be supported as an equilibrium. In particular, the prices

= v

)−I

= 0, a

−i

)) for all i ∈ V and p

= 0 for all i ∈ V makes a

an equilibrium. We

next verify that at these prices a

is indeed a user equilibrium and than that none of the platforms

have a proﬁtable deviation. It is a user equilibrium because the payoff of any user such as user

i on platform 1 is larger than or equal to c

(V) − v

(V). If user i deviates and joins platform 2,

then her payoff is smaller than or equal to c

({i}) +

j∈V

(V) (because the highest price offered

to users on platform 2 is the total leaked information). User i does not have a proﬁtable deviation

because c

(V)−v

(V) > c

({i})+δ−v

(V) ≥ c

({i})+

j∈V

(V). We next show that platforms

do not have a proﬁtable deviation. Next suppose that platform 1 deviates and offers price vector

. Since c

(V) − v

(V) > c

({i}) + δ − v

(V) ≥ c

({i}) +

j∈V

(V), all users joining platform

2 is a user equilibrium of the price vectors (

, p

). This user equilibrium gives platform 1 zero

payoff and therefore deviating and offering price vector

is not proﬁtable for platform 1. The

same argument also establishes that platform 2 does not have a proﬁtable deviation. 

B-15

Proof of Proposition 8

We prove that with the price given by p

(v) =



(a(v)) +

j6=i

(1 − v

(a(v))



+ h

−i

), for

any function h

(·) which depends only on v

−i

where a(v) is deﬁned as

a(v) = argmax

a∈{0,1}

i∈V

(1 − v

(a),

the users report their value of privacy truthfully. Consider user i ∈ V and suppose she reports v

instead of her true value of privacy v

. Her payoff becomes

, v

−i

) − v

(a(v

, v

−i

)) =





(a(v

, v

−i

)) +

j6=i

(1 − v

(a(v

, v

−i

))





+ h

−i

) − v

(a(v

, v

−i

))





j∈V

(1 − v

(a(v

, v

−i

))





+ h

−i

)

(a)

≤





j∈V

(1 − v

(a(v

, v

−i

))





+ h

−i

)

(b)

= p

(v) − v

(a(v)),

where inequality (a) follows from the deﬁnition of a(v) as the maximizer of the surplus and equal-

ity (b) follows from the deﬁnition of a(v) and p(v), and p

(v) − v

(a(v)) is equal to the payoff of

user i if she reports truthfully.

Finally, note that with the choice of h

−i

) = − min

a∈{0,1}



j6=i

(1 − v

(a) + I

(a)



the

prices become nonnegative because

(v) =





(a(v)) +

j6=i

(1 − v

(a(v))





− min

a∈{0,1}





(a) +

j6=i

(1 − v

(a)





≥ 0.

Proof of Theorem 9

Using the revelation principle, we can focus on direct mechanisms and then ﬁnd the optimal

direct incentive compatible mechanism. For the given prices, users in the set a(v) share their data.

Letting A

(v, v

−i

) = I

(a(v, v

−i

)) − I

= 0, a

−i

(v, v

−i

)), the incentive compatibility constraint

can be written as

(v, v

−i

) − vA

(v, v

−i

) ≥ p

, v

−i

) − vA

, v

−i

). (B-23)

Writing the ﬁrst order condition for inequality (B-23) yields p

(v, v

−i

) = vA

(v, v

−i

) for all v. Tak-

ing integral of both sides yields

(v, v

−i

) = −

max

(x, v

−i

)dx

(a)

max

(x, v

−i

)dx + vA

(v, v

−i

), (B-24)

where we used integration by part in (a). The expected payment to user i is

v∼F

(v, v

−i

)] =

(v, v

−i

(v)dv

(a)

= −

max

(v, v

−i

)dxf

(v)dv

B-16

(b)

= −

max

(x, v

−i

(v)dvdx = −

max

(x)xA

(x, v

−i

)dx

(c)



max

(x)x + F

(x))A

(x, v

−i

)dx



− F

(x)xA

(x, v

−i

max



x +

(x)



(x)A

(x, v

−i

)dx.

where (a) follows from equation (B-24), (b) follows from changing the order of integrals, and (c)

follows from integration by part. Therefore, the expected payment to user i becomes E

(v)] =

[Φ

(v)]. Therefore, the expected payoff of the platform is

i=1

(a(v)) − Φ

)(I

(a(v)) − I

= 0, a

−i

(v)))

The equilibrium sharing proﬁle maximizes

i=1

(a(v)) − Φ

) (I

(a(v)) − I

= 0, a

−i

(v)))

i=1

(1 − Φ

))I

(a(v)) + Φ

= 0, a

−i

(v)),

for any reported vector v. Finally, note that maximizing this expression yield the following prop-

erty: if a(v, v

−i

) = 1, then for all v

≤ v we have a(v

, v

−i

) = 1. This follows from the fact that by

Assumption 2 we have Φ

) ≤ Φ

(v) for all v

≤ v. Since the leaked information is monotone,

we obtain that A

(v, v

) is decreasing in v because as we increase v, Φ

(v) increases, which means

(v, v

−i

) decreases, which in turn means that A

(v, v

) = I

(a(v, v

−i

))−I

= 0, a

−i

(v, v

−i

)) de-

creases. This monotonicity property together with the payment identity (B-24) guarantees that the

incentive compatibility constraint holds as we show next. Using the payment identity (B-24), the

incentive compatibility constraint p

(v, v

−i

) − vA

(v, v

−i

) ≥ p

, v

−i

) − vA

, v

−i

), is equivalent



max

(x, v

−i

)dx



+vA

(v, v

−i

)−vA

(v, v

−i

) ≥



max

(x, v

−i

)dx



, v

−i

)−vA

, v

−i

After canceling out the term vA

(v, v

−i

) and rearranging the other terms, this inequality becomes

equivalent to

(x, v

−i

)dx ≥ (v

− v)A

, v

−i

). To show this inequality we consider the fol-

lowing two possible cases:

• v

≥ v: we have

(x, v

−i

)dx

(a)

≥

, v

−i

)dx = (v

− v)A

, v

−i

where (a) follows from the fact that A

(x, v

−i

) is decreasing in x and hence for all x ∈ [v, v

]

B-17

we have A

(x, v

−i

) ≥ A

, v

−i

• v

< v: we have

(x, v

−i

)dx =

−A

(x, v

−i

)dx

(a)

≥

−A

, v

−i

)dx = (v

− v)A

, v

−i

where (a) follows from the fact that A

(x, v

−i

) is decreasing in x and hence for all x ∈ [v

, v]

we have −A

(x, v

−i

) ≥ −A

, v

−i

Proof of Theorem 10

In this proof, we will use the following result, which is similar to Lemma 3 (proof omitted).

Lemma B-1. All users for which we have Φ

) ≤ 1 share their data in equilibrium.

Part 1: Recall that for any action proﬁle a and set T ⊆ V, a

denotes (a

: i ∈ T ). Since, high-value

users are uncorrelated with all other users, the ﬁrst best is to have all low-value users share. This

is because for any sharing proﬁle a ∈ {0, 1}

we have

Social surplus(a) =

i=1

(1 − v

(a)

i∈V

(l)

(1 − v

(l)

) +

i∈V

(h)

(1 − v

{i}

)

(b)

≤

i∈V

(l)

(1 − v

(l)

)

(c)

≤

i∈V

(l)

(1 − v

(l)

) = Social surplus(1

(l)

where (a) holds because high-value users’ data are uncorrelated from all other users, (b) holds

because for high-value users (1 − v

{i}

) ≤ 0, and (c) holds because the leaked information

is monotone. We next show that in equilibrium only users in V

(l)

= V

(l)

share their data. Using

Theorem 9, in the equilibrium the platform maximizes

i=1

(1−Φ

))I

(a)+Φ

−i

, a

= 0).

We can write

i=1

(1 − Φ

))I

(a) + Φ

−i

, a

= 0)

(a)

i∈V

(l)

(1 − Φ

))I

(l)

) + Φ

(l)

\{i}

)

i∈V

(h)

(1 − Φ

))I

{i}

)

(b)

≤

i∈V

(l)

(1 − Φ

))I

(l)

) + Φ

(l)

\ {i}),

where (a) holds because high-value users’ data are uncorrelated from all other users and (b) holds

because for high-value users (1 − Φ

))I

{i}

) ≤ 0 and for users in V

(l)

we have replaced their

leaked information by its maximum. This shows that in both equilibrium and ﬁrst best all users

in V

(l)

= V

(l)

share their data and no other user shares her data.

Part 2: We let i ∈ V

(l)

and j ∈ V

(h)

be such that Σ

> 0. Therefore, there exists δ > 0 such that

(l)

) = δ > 0. We next show that for v

> 1 +

i∈V

(l)

the surplus of the action proﬁle a

B-18

negative, establishing that it does not coincide with the ﬁrst best. We have

Social surplus(a

) =

i∈V

(l)

(1 − v

) +

i∈V

(h)

(1 − v

)

(a)

≤

i∈V

(l)

(1 − v

)σ

i∈V

(h)

(1 − v

)

(b)

≤





i∈V

(l)

(1 − v

)σ





+ (1 − v

(l)

) ≤





i∈V

(l)





+ (1 − v

(l)

)

(c)

< 0

where in (a) for users in V

(l)

we have upper bounded the leaked information with its maximum,

in (b) we have removed all the negative terms in the second summation except for the one corre-

sponding to j for which we have replaced leaked information (of equilibrium action proﬁle) by its

minimum (using Lemma B-1), and in (c) we have used v

> 1 +

i∈V

(l)

Part 3: First note that for sufﬁciently large

v, all users who are correlated with at least one of the

high-value users do not share in the ﬁrst best, and let this ﬁrst best proﬁle denoted by a. Since

user i is correlated with some of the high-value users, in the ﬁrst best user i ∈ V

(l)

\ V

(l)

does not

share and we also have

j∈V

(h)

(a) = 0. (B-25)

We next show that for sufﬁciently small ¯v

the action proﬁle a cannot be equilibrium. Using The-

orem 9, in the equilibrium the platform maximizes

i=1

(1 − Φ

))I

(a) + Φ

−i

, a

= 0).

We show that the action proﬁle a

= (a

−i

, a

= 1) leads to a higher

i=1

(1 − Φ

))I

(a) +

−i

, a

= 0). We can write





j=1

(1 − Φ

))I

) + Φ

−j

, a

= 0)





−





j=1

(1 − Φ

))I

(a) + Φ

−j

, a

= 0)









j∈V\{i}

= 1, a

−i

) − I

= 0, a

−i

)





+ ((1 − Φ

))I

= 1, a

−i

) − (1 − Φ

))I

= 0, a

−i

))

−

j∈V\{i}

)



) − I

−j

, a

= 0)) − I

(a) + I

−j

, a

= 0)



(a)

≥





j∈V\{i}

= 1, a

−i

) − I

= 0, a

−i

)





+ ((1 − Φ

))I

= 1, a

−i

) − (1 − Φ

))I

= 0, a

−i

))

(b)

≥





j∈V

(h)

({i})





+ (1 − Φ

))I

({i})

(c)

> 0,

where (a) follows from submodularity of leaked information, (b) follows from equation B-25 to-

gether with submodularity of leaked information and the fact that 1 − Φ

) < 0, and (c) follows

by letting Φ

) ≤ 1 +

j∈V

(h)

({i})

. Letting ˜v be the minimum of Φ

−1



1 +

j∈V

(h)

({i})



over

all i ∈

(l)

, completes the proof of this part.

B-19

Part 4: Let i, k ∈ V

(h)

be such that Σ

> 0. First, note that in the ﬁrst-best solution none of the high-

value users share. We next show that if the value of privacy for high-value user i is small enough,

then at least one high-value user shares in equilibrium. This establishes that any equilibrium is not

efﬁcient. We show this by assuming the contrary and then reaching a contradiction. Suppose that

none of high-value users share. We show that if user i shares, the objective

i=1

(1−Φ

))I

(a)+

−i

, a

= 0) increases. We let a ∈ {0, 1}

denote the equilibrium sharing proﬁle and a

be the same actions proﬁle except that user i is also sharing in a

. Using this notation, let us write

i=1

(1 − Φ

))I

) + Φ

−i

, a

= 0) −

i=1

(1 − Φ

))I

(a) + Φ

−i

, a

= 0)





j∈V\{i}

= 1, a

−i

) − I

= 0, a

−i

)





+ ((1 − Φ

))I

= 1, a

−i

) − (1 − Φ

))I

= 0, a

−i

))

−





j∈V\{i}

)(I

) − I

−j

, a

= 0)) − Φ

)(I

(a) − I

−j

, a

= 0))





(a)

≥





j∈V\{i}

= 1, a

−i

) − I

= 0, a

−i

)





+ (1 − Φ

)) (I

= 1, a

−i

) − I

= 0, a

−i

))

(b)

≥





j∈V

(h)

({i})





+ (1 − Φ

)) (I

= 1, a

−i

) − I

= 0, a

−i

))

(c)

≥





j∈V

(h)

({i})





+ (1 − Φ

))I

({i})

(d)

> 0,

where (a) follows from the submodularity of leaked information, (b) follows from the fact that in

the ﬁrst best action proﬁle a the leaked information of high-value users are zero, (c) follows from

submodularity of leaked information and the fact that 1 − Φ

) < 0, and (d) follows by letting

) < 1 +

j∈V

(h)

({i})

. The proof is completed by letting ¯v

= Φ

−1



1 +

j∈V

(h)

({i})



. 

Proof of Proposition 10

We ﬁrst show that for any user i ∈ V for which a

= 0, we have a

= 0. Suppose the contrary, i.e.,

= 0 and a

= 1. From the choice of taxation policy, since a

= 0, we have a non-zero tax on

user i’s data. We next compare the platform’s payoff with and without user i’s data and show that

the sharing proﬁle a

in which a

= 1 cannot be equilibrium. Using the price characterization of

Theorem 2,

U(a

−i

, a

= 1, p

) − U(a

−i

, a

= 0, p

=0,a

−i

)

B-20





j=1

(1 − v

−i

, a

= 1) + v

−{i,j}

, a

= 0, a

= 1)





− t

−





j=1

(1 − v

−i

, a

= 0) + v

−{i,j}

, a

= 0, a

= 0)





(a)

≤ −t

i :v

≤1

j :v

(b)

< 0,

where (a) follows from the fact that for v

≤ 1, we have (1 − v

−i

, a

= 1) + v

−{i,j}

, a

0, a

= 1) ≤ I

−i

, a

= 1) ≤ σ

and (1 − v

−i

, a

= 0) + v

−{i,j}

, a

= 0, a

= 0) ≥ 0.

Also, for v

≥ 1 we have (1 − v

−i

, a

= 1) − (1 − v

−i

, a

= 0) ≤ 0 and v

−{i,j}

, a

0, a

= 1) − v

−{i,j}

, a

= 0, a

= 0) ≤ v

. The inequality (b) follows because of the choice of

taxation t

. On the other hand, if a

= 1, then we must have U (a

−i

, a

= 1, p

) ≥ U(a

−i

, a

0, p

=0,a

−i

). This is a contradiction, proving the claim. Therefore, we must have a

≤ a

. We

next prove that the equilibrium is a

, i.e., this inequality cannot be strict. Suppose the contrary,

i.e., a

< a

is equilibrium. Using Theorem 2 we have

i=1

(1 − v

) + v

−i

, a

= 0) ≥

i=1

(1 − v

) + v

−i

, a

= 0). (B-26)

On the other hand, using Lemma 1 and a

< a

, we have

i=1

−i

, a

= 0) <

i=1

−i

, a

= 0). (B-27)

Putting (B-26) and (B-27) together, we obtain

i=1

(1 − v

) >

i=1

(1 − v

), which

contradicts the fact that a

is the ﬁrst-best solution. 

Proof of Lemma 5

Without loss of generality, suppose a

= · · · = a

= 1. We have E[X

] = E[XS

]Σ

−1

= I. We

also have E[

] = Σ

−1

E[SS

]Σ

−1

= Σ

−1

(I + Σ)Σ

−1

. We ﬁrst ﬁnd the leaked information of user

1 if she does not share. Since, the correlation between user 1’s type and the shared data

, . . . ,

is zero, this leaked information is zero.

We next ﬁnd the leaked information of user 1 if she shares her information. Note that

S and

are jointly normal. Using the characterization of Theorem 2, this leaked information is equal to

(1, 0, . . . , 0)



−1

(I + Σ)Σ

−1



−1

(1, 0, . . . , 0)

= (1, 0, . . . , 0)Σ(I + Σ)

−1

Σ(1, 0, . . . , 0)

= (σ

, Σ

, . . . , Σ

)(I + Σ)

−1

(σ

, Σ

, . . . , Σ

)

= I

= 1, a

−1

where the last equality follows from Theorem 2. This completes the proof of Lemma. 

B-21

Proof of Theorem 11

Part 1. First note that the minimum price offered to user i to share her information with action

proﬁle a

−i

must make her indifferent between her payoff if she shares which is given by p

−

= 1, a

−1

) (where we used Lemma 5) and her payoff if she does not share which is zero.

Therefore, the minimum price offered to users i to share is v

= 1, a

−i

). Since the leaked

information of users who do not share is zero, for a given action proﬁle a ∈ {0, 1}

, the payoff

of the platform with minimum prices becomes

i: a

(a) −

i: a

(1 − v

(a).

Choosing the action proﬁle that maximizes this payoff, completes the proof.

Part 2. Suppose a

is the equilibrium action proﬁle before de-correlation. Using Lemma 3, all

low-value users share in a

. We have

Social surplus(a

) =

i∈V

(1 − v

) =

i: a

(1 − v

) +

i: a

(1 − v

)

(a)

≤

i: a

(1 − v

)

(b)

≤ Social surplus(

where (a) follows from the fact that for all i such that a

= 0 we must have 1 − v

< 0 (because all

low-value users share in equilibrium), and (b) follows because Part 1 shows that the action proﬁle

is the maximizer of

i: a

(1 − v

(a). Finally, note that the equilibrium social surplus after

de-correlation cannot be negative because it is equal to

i∈V

(1 − v

)

) ≥

i∈V

(1 − v

)

(a =

0) = 0.

Mutual Information Satisﬁes Lemma 1

Monotonicity: For i ∈ V, let Y = (X

: a

= 1) and Z = (X

: a

= 1, a

= 0). Then the inequality

) ≥ I

(a) becomes equivalent to

I(X

; Y, Z) ≥ I(X

; Y ).

This inequality holds because I(X

; Y, Z) = I(X

; Y ) + I(X

; Z|Y ) ≥ I(X

; Y ) where we used the

chain rule for mutual information (Theorem 2.5.2 of Cover and Thomas [2012]) and positivity of

(conditional) mutual information (Theorem 2.6.3 of Cover and Thomas [2012]).

Submodularity: For i ∈ V, let Y = (X

: a

= 1, j 6= i) and Z = (X

: a

= 1, a

= 0, j 6= i).

Then the inequality I

= 1, a

−i

) − I

= 0, a

−i

) ≥ I

= 1, a

−i

) − I

= 0, a

−i

) becomes

equivalent to

I(X

; X

, Y ) − I(X

; Y ) ≥ I(X

; X

, Y, Z) − I(X

; Y, Z).

This inequality is equivalent to I(X

; Y, Z) ≥ I(X

; Y ) which holds as we showed before.

Finally, note that if X

for all j such that a

= 1 are independent of X

, then I(a

= 0, a

−i

) =

I(X

, (X

: a

= 1)) = 0. Therefore, Theorem 3 (and its analogous version with competition and

B-22

unknown valuations) hold by replacing uncorrelated/correlated with independent/dependent.

More precisely, we have the following theorem:

Theorem 3 with Mutual Information 1. Suppose every high-value user is independent from

all other users. Then the equilibrium is efﬁcient.

2. Suppose at least one high-value user is dependent (has a non-zero mutual informa-

tion) with a low-value user. Then there exists

v ∈ R

(h)

such that for v

(h)

≥

v the

equilibrium is inefﬁcient.

3. Suppose every high-value user is independent from all low-value users and at least

one high-value user is dependent with another high-value user. Let

(h)

⊆ V

(h)

be the

subset of high-value users that are dependent with at least one other high-value user.

Then for each i ∈

(h)

there exists ¯v

> 0 such that if for any i ∈

(h)

< ¯v

, the

equilibrium is inefﬁcient

B-23