The Massachusetts Institute of Technology just announced the launch of a new Big Data initiative. For those of you who have lived under an analytic rock for the last few years, Big Data is the name given to the movement that involves mining large volumes of consumer and social data in an attempt to identify behavioral patterns usable by corporations to market more stuff to you. Not surprisingly, the Big Data movement is largely financed by technology firms who see an opportunity to sell expensive equipment to CIOs and general managers, in the hope that algorithmic inspiration will magically arise from the fumes of the data landfill. If you’re old enough to remember the data warehouse debacle of the 70s and 80s, or the unfulfilled promised of Customer Relationship Management software (CRM), welcome back to the future.
I actually like the Big Data movement, but think it is largely misguided in its arrogant assumption that a few analytic experts can generate insights from large amounts of data through the sheer power of their brilliance, while all evidence points to the fact that these expert-driven approaches repeatedly fail. If one more person quotes Moneyball to me as evidence of the virtue of business intelligence, I think I will puke (or better yet, direct them to the latest American League standings where the Boston Red Sox occupy the last place, thanks to the aforementioned Moneyball approach).
To be clear, I very much believe in the power of analytics, as long as we understand who generates insights from data. And in most cases, it ain’t the experts, but the users of the data.
Insights come from the motivation of self-interested individuals confronted with the reality of their own data, measured against the backdrop of an entire population’s data, and hoping to discover new patterns of actions for themselves. Data itself is inert, and rarely produces action, except for a few left-brained people who teach at MIT. Most of us need to convert left-brained data into a right-brained hypothesis that we can only appropriate if we have participated in its development in some fashion. Few of us believe in universal truths to the point where we can put them into action (if this were the case, we would all be eating the right food all the time and exercising several times a day). This conversion from left-brained understanding to right-brained-driven action requires the co-creation of a personal hypothesis based on some objective evidence from the known data, and a unique act of creativity about what will work for us. This co-created hypothesis will lead to a willingness to experiment on a small scale. The experimentation on a small-scale will then lead to a more ambitious exploration of new causes and effects in the hope of figuring out new things selfishly helpful to us. Over time, the sum of all those self-generated experiments will generate population-wide hypotheses which can then be tested analytically, using big data sets (and perhaps a handful of experts from MIT).
For example, let us say I want to reduce the glucose level in my blood because I have been diagnosed as pre-diabetic. Of course, I will be told from day one by my doctor that I should reduce the intake of certain foods and exercise more (medical research has proven that broccoli is generally better than a hot fudge sundae to reduce cholesterol, so I might as well put that known fact to good use, but as already said, this will only carry me so far if I love hot fudge sundaes). What will motivate me is finding the ultimate combination of food and exercise that works for me. To get there, I will need to formulate hypotheses that apply uniquely to me (for example, by keeping hot fudge sundaes on my diet, perhaps a bit less frequently), and letting me create my own set of relevant data and measuring consequences of my personal food and exercise choices. In other words, I will want to generate my own set of data and devise my own algorithm as to what works for me.
The question I would ask is the following: given what is already known about cholesterol, from a clinical standpoint, is society more likely to make progress on the cholesterol issue by:
a. Looking for a killer predictive algorithm that predicts who will get diabetic from the pre-diabetic stage, using a Big Data approach (classic medical research and development approach)?
b. Distributing a user-friendly test kit and data log to the pre-diabetic population that allows them to test in real time their glucose level, encourages them to figure out what specific food raises their glucose level in their own body after each meal, and measures the impact of exercise on their individual glucose level after each work out (the co-created approach to research and development)?
I’ll leave it to the National Institutes of Health to spend my tax dollars on scenario a, and I’ll personally put my money on scenario b. Why? Because you will get a lot more engagement from linking personal data to individual courses of action for each patient. If we can get millions of pre-diabetic patients to self-create their own clinical experimentation – imperfect as this “clinical trial” would be from a statistical standpoint– we will learn a lot more than by having three scientists set up a double-blind, exquisitely narrow hypothesis and spend the next ten years collecting that data, complete with double-blind set-up and T statistics. Beyond the obvious advantage of collecting data on a large scale, the moment where patients start tracking their own data, they will start experimenting with new approaches that interests them and them only. This will create a wide field of distributed experimentation that can then be aggregated into wider insights, making the personally co-created data and insights into usable research data for the whole population.
The point here is that patients are contributing a lot more than data. They are also contributing insights, by formulating hypotheses about what could work for them, and by setting up personal experiments to test those hypotheses. Their motivation for doing so is not analytic (they’re not looking for a Nobel Prize of Medicine), but self-serving (they want to get healthy). Right-brained motivation for self-improvement is the currency of true research.
Of course, a lot of the hypotheses formulated by individuals will be dead ends, and they will naturally weed themselves out. Co-created research is a messy game where a few insights hide in a forest of marginal or even useless ideas. Experts could in many cases have elegantly dismissed these naïve or erroneous out of hand through their a priori knowledge, but it does not matter in the end because the sheer volume engendered by self-interested people will always trump the expertise held by a few . There is a legitimate role for experts, but it involves coaching patients into investigating some areas rather than ours and structuring the aggregation of both data and algorithms at the larger population level, not claiming a monopoly in generating those insights as the current Big Data approach suggests.
MIT, let the Big Data bird out of its expert cage.