Mining the Tar Sands of Big Data

I liked the analogy Michael Driscoll and Roger Ehrenberg used in their article announcing GigaOM’s Structure: Big Data conference on March 23 in New York City.

In a similar vein, much of the world’s most valuable information is trapped in digital sand, siloed in servers scattered around the globe. These vast expanses of data — streaming from our smart phones, DVRs, and GPS-enabled cars — require mining and distillation before they can be useful.

Both oil and sand, information and data, share another parallel: In recent years, technology has catalyzed dramatic drops in the costs of extracting each.

Unlike oil reserves, data is an abundant resource on our wired planet. Though much of it is noise, at scale and with the right mining algorithms, this data can yield information that can predict traffic jams, entertainment trends, even flu outbreaks.

These are hints of the promise of big data, which will mature in the coming decade, driven by advances in three principal areas: sensor networks, cloud computing, and machine learning.

[…]

Together, these three technology advances lead us to make several predictions for the coming decade:

1. A spike in demand for “data scientists.” Fueled by the oversupply of data, more firms will need individuals who are facile with manipulating and extracting meaning from large data sets. Until universities adapt their curricula to match these market realities, the battle for these scarce human resources will be intense.

2. A reassertion of control by data producers. Firms such as retailers, banks, and online publishers are recognizing that they have been giving away their most precious asset — customer data — to transaction processors and other third-parties. We expect firms to spend more effort protecting, structuring and monetizing their data assets.

3. The end of privacy as we know it. With devices tracking our every point and click, acceptable practice for personal data will shift from preventing disclosures towards policing uses. It’s not what our databases know that matters — for soon they will know everything — it’s how this data is used in advertising, consumer finance, and health care.

4. The rise of data start-ups. A class of companies is emerging whose supply chains consist of nothing but data. Their inputs are collected through partnerships or from publicly available sources, processed, and transformed into traffic predictions, news aggregations, or real estate valuations. Data start-ups are the wildcatters of the information age, searching for opportunities across a vast and virgin data landscape.

Read the full article.