Contrary to what you have heard, the unfolding technological transformation we are witnessing isn’t really about data, not directly at any rate. It’s not that data isn’t important, but the focus on data is obscuring the real nature of change, which is the transition from a world driven by essentially static and reactive systems to one driven by hyper-localized, adaptive control systems.
.
These controllers are already in our cars, homes, and offices, and will be in our clothing, our parks; literally woven into the fabric of our physical environment. The future will not be defined by how much data is collected, but by the complexity and responsiveness of our localized environments.
Data sounds nicer than control
Unfortunately, control or control systems aren’t commonly used terms/ideas, even in many of the applied data fields (Marketing, that’s you I am talking about), but they really should be. So what is control and why is it important? Control is a process of making decisions, and accepting feedback, in order to achieve some objective. In other words, it is something that senses and acts, it isn’t inert like data.
Let’s use simple example of a common controller – your basic thermostat. Your thermostat’s objective is to maintain a certain temperature in a room, or your house. It does this, in the simplest case, by checking the temperature of the room (this is data collection) and then based on its reading, will Heat, Cool, or do Nothing.
The rules that govern how the controller behave are called the control logic. In simple cases, like our thermostat, the control logic can be easily written out by a human. However, more advanced applications, like autonomous driving cars, are so complex that we will often need to learn much of the control logic from data, rather than have it directly programmed by people.
Why write it when the machine can learn it?
This is where data plays one of its major roles, in helping to learn the control logic. By employing machine learning (see our data science posts here and here) , we can learn the basic logic required for a particular controller. We can then hone and optimize the efficacy of the controller by embedding addition systems for updating the controller’s logic after it has been deployed – these adaptive systems use the current data from the system’s environment in order to continuously update and improve upon the control logic.
Big Data is afraid of its shadow prices
Folks who are excited about Big Data should start to think less about data per se, and more on how data will drive how we go about 1) creating more powerful controller logic and; 2) improving precision by enabling control systems access to more precise and higher dimensional data.
By framing data in terms of the control problem, naturally leads to real data questions, like, what if I didn’t have this bit of data, how much less effective would the system be? In other words, you can start to think about the marginal value of each new bit of data, so that you can move toward having an optimal volume and precision of data with respect to your goals and objectives.
Pearls of Wisdom or ‘Correlation isn’t Causation’
While true, you often hear “Correlation isn’t Causation” often proudly exclaimed without any real followup about what that really means. By taking a control perspective, we can begin to get a little clarity on how to differentiate data that provides correlations and data that provides causation relationships.
Data that is passively gathered will tend to give you correlations. The data that you gather from your controller’s actions, however, will give you causal relationships, at least with respect to the actions that the controller takes. In fact, you can think of AB Testing as employing a type of dumb controller, one that that takes random actions. If you want to learn a bit more about the topic from an actual expert take a look at Judea Pearl’s work (opens a Pdf).
Data is Lazy, and leads to lazy thinking.
Here is the thing, data is passive. That makes it easy to collect and talk about. Integrating it into a working system or process is the hard part. Control, by definition, is active, and that makes it hard, because you have to now think about how the entire system is going to respond to each control action. That is probably one of the main reasons there is so much attention on data, you get to dodge the hard, but ultimately most valuable questions.
*Edited 6/2/2018
