Going from 0-60 in Big Data
Why we need to bring back the every-day driver when it comes to analytics…
After spending the last decade or so figuring out how to track and store a ton of data, now companies are increasingly asking, “ok, so now what?”
What can we do with our data, now that we have it?
It’s hard to go to a conference these days without hearing the term “big data” bandied about, or browse a job site without seeing a post for a company desperately looking for a “data scientist”, the proverbial gatekeeper to this big data.
I love data, big and small. I love listening to it for clues about how the world works and uncovering patterns hidden in the noise to help me validate things I knew or discover things I didn’t know I didn’t know.
While I am excited whenever we data geeks get more street cred in the business world (or politics or sports if those are your things), I am also a little bit worried about the sudden shift we’ve seen recently for companies suddenly attempting to go straight from a beat up ’89 Chevy Blazer approach to a 2013 Ferrari 458 with no in-between.
Why a Ferrari makes a poor daily driver
For one, it takes driving skills, training, and practice to go from driving a clunker SUV to a performance sports car, and this doesn’t happen overnight. But even once you’ve adjusted to the changes in style and performance, you are still faced with the reality that the Ferrari is designed for the race track and not your daily commute. It gets poor mileage. It is expensive to maintain. It performs poorly in adverse conditions. It is so powerful that it literally tempts you at every turn to cut corners (too close) and fly way too fast for most roads.
But in big data science, there are no police to keep you in check. And traffic congestion can easily be overcome just by throwing more cloud resources at the problem (instead of improving your traffic-skirting technique and finesse). So it’s a slippery slope- one that many companies fall prey to. Crashes do happen, and easily, though you often don’t notice them until much after the fact.
Lately, we’ve seen a new crop of analytics services, powering everything from straight analytics as a service, analytics api aggregation as a service, text analysis as a service, and data science as a service. These services are really neat and solve some important use cases. But they also present a double-edged sword…
More often than not, the folks with the technical expertise to build and maintain these systems (the engineers) are very different from those doing the analyses (the statisticians or data scientists), and still different from those that should be shaping the analyses and distilling insights from them (the business leaders).
The question is, who is driving?
If it’s your expensive new data scientist, you may very well be answering specific questions really well. Just as a muscle car does a really great quarter mile. But you also may be ignoring the big business questions that really matter, which could be answered with a dramatically simpler approach, say in Excel.
When it’s snowing, a 10 second quarter mile rear-wheel drive Ferrari doesn’t really matter. Give me something with 4 seats and all wheel drive for my daily commute. It’s not as fast or flashy, but it works for 95% of my use cases. For the other 5%, there’s always the track…