Experimentation and Privacy by Design

The Value in Being Intentional and Customer Focused

Privacy in analytics is often considered a constraint or a barrier that limits our actions or our ability to develop powerful technological solutions. However, embedding privacy by design principles—such as data minimization—actually enhances our systems and software. Why? Because it aligns with our field’s primary value proposition: The main value of experimentation/AB Testing programs is that they provide a principled framework for organizations to act and learn intentionally so that, ultimately, we can be customer advocates.

It is our view that privacy by design thinking is inherently intentional thinking. It’s good engineering that puts the customer first and requires one to explicitly ask ‘why’ before collecting additional information. Privacy by design helps us be outcomes-based (focus on the customer) vs. compliance/procedure-based (focus on the organization).

We have found that by following privacy by design engineering principles, we have not only been able to engineer data minimization and privacy by default designs directly into Conductrics’s software, but this has had the happy effect of also leading to multiple benefits, including data storage, computational, and reporting efficiencies.

What is Privacy by Design?

Privacy by design (PBD) is a set of engineering principles that seek to embed privacy-preserving capabilities as the default behavior directly into systems and technologies.
The seven principles, as originally specified by Dr. Ann Cavoukian, are:

Proactive, not reactive – That is, systems should be intentional and anticipate privacy requirements rather than responding to privacy issues after the fact
Privacy as the default setting
Privacy embedded into design
Full functionality – That is, privacy should not unreasonably impair access or use.
End-to-end security
Visibility and transparency
Respect for user privacy

At Conductrics INC we came across Privacy by Design back in 2014/2015 after stumbling on Article 22 of GDRP (Automated individual decision-making, including profiling).

At the time, we were looking to refactor our original predictive targeting/contextual bandit features to make it easier for our customers to quickly be able to understand which of their customers would get which experiences. For those not familiar, contextual bandits are a machine learning problem where one wants to jointly solve the problem of:

Discovering and delivering the best treatment for each user in an ongoing, adaptive way (often called heterogeneous treatment effects); and
The sub-problem of discovering which covariates (contextual information) are useful to find those best treatments.

When we learned about the then upcoming GDPR and Article 22 (Automated individual decision-making, including profiling), we decided that we needed to make interpretability a first class feature of our predictive targeting/bandit feature. That meant scraping the normalized radial basis function net we had built for something, simpler and more transparent. To our surprise, by using a few clever tricks, making the Conductrics predictive engine less complex, not only enabled interpretability, but made it much more computationally efficient and improved the over all efficacy the system.

However, Art 22 led us to Art 25, and its mandate to embed data minimization into the design of software. That ultimately led us to refactor all of our data collection and statistical procedures so that we could both provide all of the analysis needed for AB Testing (t-tests, partial f-tests, factorial ANOVA, and ANCOVA) and incorporate K-anonymization into our data storage and report/auditing.

Why Bother with Privacy by Design?

It’s a requirement.

Art 25 of the GDPR is titled “Data Protection by Design and by Default”

“The controller shall …implement appropriate technical and organizational measures … such as data minimisation…in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects.”

It puts the customer first.

Following privacy by design, we have found, aligns with ensuring that products are more useful for customers—Instead of maximizing a customer’s value to the company, one builds and offer products and services that maximize the company’s value to their customers.

Privacy engineering and data engineering are both about privacy and about why we’re performing data analytics and experimentation in the first place. Again, our view is that the value of experimentation is that it provides a principled procedure for organizations to learn, make decisions, and act intentionally. The ultimate reason for doing this is to help teams —who are near the front line of the business, where engagement happens—to act as customer advocates.

According to Fred Reichheld, creator of NPS, “There is no way to sustainably deliver value to shareholders or be a great place to work unless you put customers first.” As the front line in customer interactions, analysts have the opportunity to accomplish what should be the main purpose of the business—to enrich customers’ lives by being mindful of what they want and anticipating their needs.

Privacy engineering is good engineering.

When designing a product, often solving one problem or feature can negatively affect other aspects or dimensions of the product. Much like a Rubik’s cube, there are various ways to solve just a single side of the puzzle, but these solutions are local—once you solve them, they leave the other sides of a problem in disarray.

It turns out, happily, that for many of the statistics needed for tasks like AB Testing, most of the statistical approaches used for inference in AB Testing (ANOVA, ANCOVA, t-test, partial f-tests, etc.) can be performed extremely efficiently on aggregate data stored in equivalence classes (similar to a pivot table—see a more technical view of this here). The use of equivalence classes data, rather than individual microdata, facilitates the ability incorporate K-anonymization as a simple to read measure of data minimization. Given an individual in a dataset, K-anonymity, roughly, is the minimum number of other users who are indistinguishable from that individual. Larger K’s in some sense can be said to provide more privacy protection.

Intentional Data Collection and Use

There are of course many use cases where it is appropriate and necessary to collect more specific, and identifiable information. The main point is not that one should not ever collect data, rather one should not collect, link, and store extra information by default just in case it might be useful, we should be intentional for each use case about what is actually needed and proceed from there.

Lastly, we should note that data minimization and other privacy engineering approaches are not substitutes for proper privacy policies. Rather, privacy engineering is a tool that sits alongside required privacy and legal compliance policies and contracts.
If you would like to learn more about Conductrics please reach out here.

Category: Uncategorized