With online shopping, loyalty programs, intelligent devices, and many other aspects of our daily lives, the companies that make it all possible can collect expansive amounts of our personal data. Sometimes it’s just common sense, like when we hail a taxi using a mobile app—we want the platform to know our location so we can match up with the nearest driver. With this and other data, companies can personalize their products and services to fit our preferences and needs.
At the same time, the widespread availability of data that is so deeply personal creates risks. If the company that collected it is less than virtuous, we can find ourselves signing up for unwanted ads or worse. A notable example is the consulting firm Cambridge Analytica, which used Facebook data 50 million Americans in an attempt to influence the 2016 election.While this is an extreme example, similar incidents of data leakage and misuse on a smaller scale occur every day.
What measures can governments and regulators take to prevent such abuses? How should companies and digital businesses that rely on our data as a enormous part of their business models change their practices and policies to keep our data sheltered?
Why current regulations are ineffective
To shed lithe on digital privacy and the design measures that regulators and companies can take to protect consumer privacy, a team of researchers from the US, UK, and Canada examined the interaction between three parties that have an interest in our data: us as individuals, the companies we interact with, and third parties. Our research question was: How does a company’s data strategy—essentially its decisions about how much data to collect and how to protect the data—affect the interaction between these three parties?
Overall, we found that when companies choose data policies based solely on their own self-interest, they collect more data than would be optimal for consumers. Our findings indicate that when industry leaders – for example, Mark Zuckerberg – claim to collect exactly the amount of data (or even less) that their consumers want, are not always forthright.
Our work highlights the need to regulate such markets. In the United States, the key regulator of data is the Federal Trade Commission (FTC). After the Cambridge Analytica scandal, FTC fines Facebook $5 billioneven if the company’s business model is left intact. The FTC’s primary efforts are now directed largely at requiring companies to enforce their privacy policies and provide at least a minimum level of data protection. Our research shows that this is simply not enough.
Two solutions to limit data collection
We propose two main types of instruments to discourage companies from collecting more data than absolutely necessary:
-
A tax proportional to the amount of data a company collects. The more data a company collects about its customers, the higher the financial cost of that data to the company.
-
Penalties for liability. The idea is that the penalties that regulators impose on companies after a data breach should be proportionate to the harm that consumers suffer. In the case of Cambridge Analytica, the breach was massive, so the company should pay a significant penalty.
Both of these instruments can support restore efficiency to these types of markets and support a regulator like the FTC pressure companies to only collect as much data as customers are willing to share.
A Modern Look at Revenue Management
Data-driven revenue management has emerged in recent years. Companies increasingly utilize our personal data to sell us products and services. Insurance companies offer personalized offers based on intimate details of our lives, including medical histories. The financial industry designs loans that fit our spending patterns. Facebook and Google decide how to build our newsfeeds with their advertisers in mind. Amazon selects the product mix to offer each customer based on their past purchases.
The common denominator among all these seemingly disparate companies is how they decide what price to charge or what assortment to show each customer. A key ingredient is customer data: personalized revenue management companies apply advanced machine learning techniques and algorithms to historical data from their previous customers to build models of human behavior. In essence, a company can come up with the best possible price (or assortment, for example) for a fresh customer because they will be reminiscent of previous customers with similar characteristics.
With this type of decision-making framework, typically used in data-driven revenue management applications that rely heavily on (potentially sensitive) historical data, there are pressing privacy concerns. While a hacker can simply steal historical data, they don’t necessarily have to break into the database. The latest research in computer science shows that adversaries can indeed recreate sensitive information at an individual level by observing company decisions, such as personalized pricing or assortments.
Revenue management with privacy
In our work, we design “privacy-preserving” algorithms for utilize by data-driven decision-making companies. These algorithms are designed to support such companies limit the harm to their customers from data leaks or misuse, while preserving profits. While data cannot be made 100% secure, the goal is to limit the potential harm as much as possible, while striking the right balance between benefits and risks.
One way to design privacy-preserving algorithms for data-driven revenue management companies is to impose an additional constraint on the companies’ decision-making framework.
In particular, we can require that the firm’s decisions (i.e., insurance offering or product mix) should not be too dependent on (or too informative of) any particular customer’s data from the historical dataset that the firm used to derive that decision. An adversary should therefore not be able to trace the firm’s decisions and infer sensitive customer information from the historical dataset. Formally, such a requirement corresponds to the design of “variously private” revenue management algorithms. The concept has become the recognized de facto privacy standard in the industry, used by companies like Apple, Microsoft, and Google, as well as public agencies like the US Census Bureau.
We found that such privacy-preserving (or differentially private) algorithms can be designed by adding carefully tailored “noise”—essentially any meaningless data, similar to a coin flip—to firms’ decisions or to the confidential data the firm uses. For example, an insurance firm designing an offer for a specific customer might first calculate the true optimal price (for example, the price that will maximize the firm’s revenue from that specific customer), then flip a coin and add $1 if it lands on heads and subtract $1 if it lands on tails. By adding such “noise” to the original true optimal price, the firm makes the carefully designed price “less optimal,” potentially reducing profits. However, adversaries will have less information (or less inferential power) to infer anything meaningful about the firm’s confidential customer information.
Our study shows that a company doesn’t need to add a lot of noise to provide sturdy enough consumer privacy guarantees. In fact, the more historical data a company has, the cheaper such privacy protection becomes. In fact, in some cases, privacy can be achieved almost for free.
This article is based on a working document “Personalized Revenue Management with Privacy”co-authored with Yanzhe Lei of Queen’s University and Sentao Miao of McGill University, and a working paper “Digital Privacy”co-written with Itay P. Fainmesser of Johns Hopkins University and Andrea Galeotti of London Business School.
An earlier version of this article was published on Knowledge@HEC.