b'@online{Arkami2412.01968,'b'\nTITLE = {On the Theoretical Foundations of Data Exchange Economies},\nAUTHOR = {Akrami, Hannaneh and Ray Chaudhury, Bhaskar and Garg, Jugal and Murhekar, Aniket},\nLANGUAGE = {eng},\nURL = {https://arxiv.org/abs/2412.01968},\nEPRINT = {2412.01968},\nEPRINTTYPE = {arXiv},\nYEAR = {2024},\nMARGINALMARK = {$\\bullet$},\nABSTRACT = {The immense success of ML systems relies heavily on large-scale, high-quality
data. The high demand for data has led to many paradigms that involve selling,
exchanging, and sharing data, motivating the study of economic processes with
data as an asset. However, data differs from classical economic assets in terms
of free duplication: there is no concept of limited supply since it can be
replicated at zero marginal cost. This distinction introduces fundamental
differences between economic processes involving data and those concerning
other assets.
We study a parallel to exchange (Arrow-Debreu) markets where data is the
asset. Here, agents with datasets exchange data fairly and voluntarily, aiming
for mutual benefit without monetary compensation. This framework is
particularly relevant for non-profit organizations that seek to improve their
ML models through data exchange, yet are restricted from selling their data for
profit.
We propose a general framework for data exchange, built on two core
principles: (i) fairness, ensuring that each agent receives utility
proportional to their contribution to others; contributions are quantifiable
using standard credit-sharing functions like the Shapley value, and (ii)
stability, ensuring that no coalition of agents can identify an exchange among
themselves which they unanimously prefer to the current exchange. We show that
fair and stable exchanges exist for all monotone continuous utility functions.
Next, we investigate the computational complexity of finding approximate fair
and stable exchanges. We present a local search algorithm for instances with
monotone submodular utility functions, where each agent contributions are
measured using the Shapley value. We prove that this problem lies in CLS under
mild assumptions. Our framework opens up several intriguing theoretical
directions for research in data economics.
},\n}\n'