I’ve been talking a fair amount recently about the dangers of “data adjacent decision making.” This happens when you have a lot of numbers, and probably a graph, and you put them next to a decision, and then believe that your decision is based on solid scientific reasoning. I’ve seen this danger crop up in the arts world, but it’s just as prevalent everywhere else, too. Science is just hard, and all sorts of businesses are drowning in new kinds of data.
Up until very recently, data was hard to come by. There weren’t huge databases full of data points about your business living in free web applications, waiting for analysis. You had to plan an experiment, design it, carry out the primary research, and incur all kinds of costs. Back when the costs of data were higher, the quality of you’re typical dataset was higher, too. If you’re going to pay all those costs to get it, you’re going to think a lot more carefully about what the data you collect is supposed to tell you. You’re going to think carefully about how your data is supposed to work, and what it’s supposed to tell you about the world and your business.
But the costs of data have dropped precipitously. All sorts of databases are filling up with graphable data about your organization right now, and you didn’t even have to design an experiment, much less pay for it. And so the value of a typical dataset, and a typical graph, has gone down too.
However, the association between “graphs” and “scientific decision making” is still there. Having numbers, data, or a chart of any kind can convince people that the decision you’re making is a good one. It’s easy to convince yourself with these things, too. If you’ve got a chart, it’s easy to let yourself off the hook for thinking carefully.
That’s how you wind up doing “data adjacent decision making”, and it’s how data can actually do harm.
So what do you do about it? You’ve got to do the experimental design after the data has been collected. Google Analytics knows lots of stuff, but before you start creating reports and proving stuff and asserting big changes that you’ve got to make, take the time to formulate a hypothesis that the data might falsify.
Imagine someone you really admire. Imagine them walking in and asking you “How exactly is that supposed to work?” Then come up with an answer. The hard part is describing how you imagine the world to be, such that the data you’re playing with can track the thing you’re trying to study. This is called a mechanism. And that’s not so much a math problem, or a statistical analysis problem. It’s really a thinking problem.
My brother, Dennis Clark, is the CSO of a tech start-up, Luminoso, that’s just completed a series A fundraising round. Luminoso is a text analytics & natural language processing company. During the world cup they ran a massive social media monitoring project called One Stadium. It read social media posts, tracked discussions, and made it easy for anyone to know what people were talking about and how much discussion about each topic there actually was. It worked in lots of different languages, filtered out spam and profanity, and could identify concepts automatically. It was pretty awesome.
But this also lead to everyone asking my brother who would win the world cup. “You have a lot of data about the world cup. Who’s going to win?” To which his answer was, obviously, “How should I know?” There’s no reason to believe that the data he had is any good for making a prediction. He wrote a very good blog post about this problem on the day of the final. I recommend reading the whole thing.
If you aren’t going to click through, though, read this bit at least:
Data can do amazing things, but we’d all be better off thinking about it a little less as magic and a little more as a way to put some meat on the bones of our intuitions. Getting those intuitions to a point of testability is well within the reach of any modern manager, math genius or no, and a willingness to reason through the mechanisms behind relationships will serve any business person well whether or not they’re considering a data project.
So get critical. Be actually “data driven” and not just “data adjacent.” When you see data, ask “How exactly is that supposed to work?”. It’ll basically always make you look smart.