Why long-tail services will usher the next generation of big data

(Repost from an entry I wrote for tenXer here)

First there was data
Then there was big data
And now is connected big data

aka Big Data Triangulation

In a recent quantcast whitepaper on online advertising, the author discusses RTB (real-time bidding) and mentions the advantages of using an integrated, algorithmically optimized strategy vs. a mix-and-match approach, where “integrated data, algorithms and bidding produced a two to seven times lower cost per action (CPA) than the independent approach” (1).

Let me repeat: “integrated data, algorithms and bidding produced a two to seven times lower cost per action (CPA) than the independent approach”.

2-7X is a lot. This got me thinking about data. Specifically how “big data” might not always be enough on its own.

Working with dozens of fortune 500 companies, I’d always wondered at the petabytes of accumulated data that were sitting, untouched in various organizational siloes. Sooner or later, some analytics guru or consultant would come in, whip up an analysis, and unlock value from the dataset, but having once been that person, I believe that was only 80% (at best) of the “true” value of the data. The problem these organizations face is that their data, though big, is still siloed. Siloed across divisions and almost always siloed within the organization. Yet what would happen if you connected these siloes and started performing the same empirical analyses on them, but in concert? A 2-7x improvement perhaps, as with Quantcast?

Target's 'Pregnancy test'

Of course online marketers would be among the first to figure this out and embrace the principle. After all, improvements in their trade directly improve their companies’ (or clients’) bottom lines. Another industry that has already embraced connected big data: finance. We need look no further than the August 2 $440 million trading glitch to realize that high frequency trading has quickly become nearly autonomously algorithmic (save for the “off” button) and incorporates a mind-numbingly large plethora of both external (non-finance) and internal inputs.

So what? Isn’t that what big data is about anyway?

Well, yes and no. We are definitely in the era of big data. Thousands of companies now collect many terabytes worth of customer, transaction, usage, research, product, competitive, etc. data. But only a few have begun to connect that data and do things with it. The few that are (we’ve seen it first in finance, advertising, e-commerce, and, most recently-elections) are using connected big data (CBD) to sell better. But almost none are using CBD to operate better. Mainly because a) it’s really hard to pull a bunch of huge data sources together gracefully, and b) there’s not the same clear ROI for doing so as with sales functions. This, however, is changing. With the rise of long-tail internet connecting services like ZapierIFTTTElectric ImpWork.comFacebook Open Graph or white-label service providers like RapleafFactualSemantria, etc. – the barriers to pulling together disparate data sources are evaporating. Imagine being able to predict and prevent inventory shrinkage, anticipate positive or negative customer sentiment, identify your next blockbuster product before it’s built, or figure out who are your most productive employees? While perhaps a bit pie-in-the-sky for now, in the world of connected big data, these are just table stakes.

From Paul Graham on ideas he’d like to fund: “Now that so much happens on computers connected to networks, it’s possible to measure things we may not have realized we could. And there are some big problems that may be soluble if we can measure more.”

Remember: A dot in one dimension may appear identical to an oncoming locomotive. Embracing big data triangulation will help you stay on the train and not in front.

-Dan @ tenXer

About the author: Dan is currently Mad Scientist and Strategist at tenXer, and an expert in residence at General Assembly. Prior to tenXer, he led teams advising clients at fortune 500 companies in the media and communications industries on how to leverage internal and external data sources to make informed, high-impact decisions for their businesses. He is well versed with a variety of advanced analytical techniques including data mining, cluster analysis, bootstrap forests, network routing and optimization, machine learning and automation, sentiment analysis, marketing impact allocation and optimization, and data de-siloing.

(1): “PROMISE UNFULFILLED? LESSONS FROM THE REVOLUTION: Six Things You Need to Know About Real-time Display Advertising”. Quantcast, 2012.

Also read...