Interconnexion fuels Innovation

I enjoyed reading this analysis of innovation from in, here are his conclusions :

Ideas Having Sex: How prosperity and innovation exceeded the expectations of John Stuart Mill and Adam Smith

Innovators are in the business of sharing. It is the most important thing they do, for unless they share their innovation it can have no benefit for them or for anybody else. And the one activity that got much easier to do after about 1800, and has gotten dramatically easier recently, is sharing. Travel and communication disseminated information much faster and further.

[…] When Hero of Alexandria invented a steam engine in the first century A.D. and employed it in opening temple doors, news of his invention spread so slowly and to so few people that it may never have reached the ears of cart designers. Ptolemaic astronomy was ingenious and precise, if not quite accurate, but it was never used for navigation because astronomers and sailors did not meet. The secret of the modern world is its gigantic interconnectedness. Ideas are having sex with other ideas from all over the planet with ever-increasing promiscuity. The telephone had sex with the computer and spawned the Internet.
Technologies emerge from the coming together of existing technologies into wholes that are greater than the sum of their parts. Henry Ford once candidly admitted that he had invented nothing new: He had “simply assembled into a car the discoveries of other men behind whom were centuries of work.” Inventors like to deny their ancestors, exaggerating the unfathered nature of their breakthroughs, the better to claim the full glory (and sometimes the patents) for themselves.

[…] We may soon be living in a post-capitalist, post-corporate world where individuals are free to come together in temporary aggregations to share, collaborate, and innovate, and where websites enable people to find employers, employees, customers, and clients anywhere in the world. This is also, as the evolutionary psychologist Geoffrey Miller reminds us, a world that will put “infinite production ability in the service of infinite human lust, gluttony, sloth, wrath, greed, envy, and pride.” But that is roughly what the elite said about cars, cotton factories, and (I’m guessing) wheat and hand axes too.

Read the full text, it’s worth it.

Color App creates elastic networks

Color is a new photo-sharing application for iPhone and Android phones.  It allows you to share the photos you have taken through it with any other Color user near you.  And you get to see also his or her pictures 🙂   They defined a ‘proximity’ criteria, that creates local temporary networks.  As soon as you leave the place, if you are no longer close to the other user, you loose access to his photos (and he to yours).

But if you hang around the same Color user for some time, photos will remain longer than with an occasional user.  Color is using machine learning algorithms to create this ‘elasticity’.  Great concept!

Much of last week’s buzz surrounding the launch of Color was justifiably skeptical. The startup, after all, raised $41 million to enter a crowded space without a business model or customers, and many wonder whether the world really needs another mobile photo-sharing app. But two components of Color’s vision — implicit networks (connections created without user effort) and place/time tagging — extend far beyond photo-sharing, and make the company worth watching as a potential indicator of social media and data-mining trends.

Read more of David Card’s article: Color: More Than Just Another Photo-Sharing App

Crowdsourcing in Search and Data Mining

Here are some Notes from “Crowdsourcing in Search and Data Mining” (CSDM) workshop taken by Panos Ipeirotis.  The workshop took place in Hong Kong, just a few days ago, and tackled different issues on crowdsourcing, like:

When should we let people talk to each other vs let them work independently?

– How Crowdsourçable is Your Task?

– An Examination of Crowdsourcing Incentive Models in Human Resource Tasks

Clicks: where did you think the data was coming from?

All these buzz about tracking clicks began with Bing, but goes well beyond that Search Engine.

Read this article, it talks about clicks and other telemetries (remote measurement and reporting of information).  There are a lot of examples of applications where you have been tracked, knowingly or not.

You could see it as:  you give something, and you’re given something in return. I’m not saying you will be always happy with what they did with your information … 🙂 but it’s on the majority’s benefit.

What do you think,  where do you put the limit in data privacy?  Would you give willingly your habits in order to improve the ‘usability’, friendliness of applications?

Get visual to design your information extraction process

Here  Pete Warden describes the release this month of A Free Visual Programming Language for Big Data:

Until the last few years, large scale data processing was something only big companies could afford to do. As Hadoop has emerged, it has put the power of Google’s MapReduce approach into the hands of mere mortals. The biggest challenge is that it still requires a fair amount of technical knowledge to set up and use. Initiatives like Hive and Pig aim at making Hadoop more accessible to traditional database users, but they’re still pretty daunting.

That’s what makes today’s release of a new free edition of EMC’s Greenplum big data processing system so interesting. It draws on ideas from the MapReduce revolution, but its ancestry is definitely in the traditional enterprise database world. This means it’s designed to be used by analysts and statisticians familiar with high-level approaches to data processing, rather than requiring in-depth programming knowledge. So what does that mean in practice?

Visual programming can be a very effective way of working with data flow pipelines, as Apple’s Quartz Composer demonstrates in the imaging world. EMC has an environment called Alpine Miner that lets you build up your processing as a graph of operations connected by data pipes. This offers statisticians a playground to rapidly experiment and prototype new approaches. Thanks to the underlying database technology they can then run the results on massive data sets. This approach will never replace scripting for hardcore programmers, but the discoverability and intuitive layout of the processing pipeline will make it popular amongst a wider audience.

Complementing Alpine Miner is the MADlib open-source framework. Describing itself as emerging from “discussions between database engine developers, data scientists, IT architects and academics who were interested in new approaches to scalable, sophisticated in-database analytics,” it’s essentially a library of SQL code to perform common statistical and machine-learning tasks.

The beauty of combining this with Alpine Miner is that it turns techniques like  Bayes classification,  k-means clustering and multilinear regression into tools you can drag and drop to build your processing pipeline.

Mining the Tar Sands of Big Data

I liked the analogy Michael Driscoll and Roger Ehrenberg used in their article announcing GigaOM’s Structure: Big Data conference on March 23 in New York City.

In a similar vein, much of the world’s most valuable information is trapped in digital sand, siloed in servers scattered around the globe. These vast expanses of data — streaming from our smart phones, DVRs, and GPS-enabled cars — require mining and distillation before they can be useful.

Both oil and sand, information and data, share another parallel: In recent years, technology has catalyzed dramatic drops in the costs of extracting each.

Unlike oil reserves, data is an abundant resource on our wired planet. Though much of it is noise, at scale and with the right mining algorithms, this data can yield information that can predict traffic jams, entertainment trends, even flu outbreaks.

These are hints of the promise of big data, which will mature in the coming decade, driven by advances in three principal areas: sensor networks, cloud computing, and machine learning.


Together, these three technology advances lead us to make several predictions for the coming decade:

1. A spike in demand for “data scientists.” Fueled by the oversupply of data, more firms will need individuals who are facile with manipulating and extracting meaning from large data sets. Until universities adapt their curricula to match these market realities, the battle for these scarce human resources will be intense.

2. A reassertion of control by data producers. Firms such as retailers, banks, and online publishers are recognizing that they have been giving away their most precious asset — customer data — to transaction processors and other third-parties. We expect firms to spend more effort protecting, structuring and monetizing their data assets.

3. The end of privacy as we know it. With devices tracking our every point and click, acceptable practice for personal data will shift from preventing disclosures towards policing uses. It’s not what our databases know that matters — for soon they will know everything — it’s how this data is used in advertising, consumer finance, and health care.

4. The rise of data start-ups. A class of companies is emerging whose supply chains consist of nothing but data. Their inputs are collected through partnerships or from publicly available sources, processed, and transformed into traffic predictions, news aggregations, or real estate valuations. Data start-ups are the wildcatters of the information age, searching for opportunities across a vast and virgin data landscape.

Read the full article.

Nowadays there is not a big interaction with the customer…Thus use data mining to help CRMs

Here’s the explanation:

As described above, data mining is an efficient tool help marketing people to analyze consumer’s behaviors, classify data and predict the future trends by pattern analysis.  Business would gain more profits if managers make good use of it.  In addition, data mining also contributes a lot to CRM, Customer Relationship Management.3  CRM is a new concept in business administration.  Companies have changed their interaction modes with customers.  Only they understand what their consumers or potential consumers are thinking, can they make more money! Many online shopping companies focus their marketing strategy on the advertisements to increase the market share and create the brand image, but they ignore the faithfulness of customers is the foundation of market share.  Although E-commerce builds an easier transaction for the business and the customer, it contrary reduces the interaction between them, so CRM becomes a critical factor to a successful operation: who can predict and understand the customer’s favors and behaviors will be the final winner.  There are several main information techniques assisting CRM; their major source is the data, including the products, customers, and any sales related data.  These data are increasing rapidly through the daily transactions and they can only be managed effectively by business intelligence system such as data mining or data warehouse.

Follow this link for the full article



Who’s Watching What? — Data Mining Raises Privacy Issues

I found this article quite frightening.. Although it’s more than a year old, I’m sure things are only worst than at the time Annie Macios wrote it.
The main idea is that companies that collected health information for their business purposes (like invoicing follow-up), and for which you gave your approval, then realised they could sell all this data.  And they can do that legally without your knowledge or consent.  
This medical information is circulating, being used for completely other purposes than for what they have been gathered in their beginning, and without any control, nor legal need of audit
There’s a company in Minnesota ‘Ingenix’ that provides all your prescriptions and other medical info to insurances companies. OK, it seems they request your agreement on giving the data, but are you really free to decide?  If you don’t agree, they don’t insure you!  And once they gave your info, nothing prevents the insurance company to re-sell it.. perhaps to your employer?
I extracted another of hers ideas:

She also points out that data mining creates the perfect scenario for identity theft because health records also include all the necessary information (Social Security number, birth date, address, etc) to open a bank account.

With all these possibilities, not to misuse data is only a question of willing… It’s time to rule about it,  for how long can we count only on the goodness of humankind?

ACM TIST Special Issue on Search and Mining User-generated Contents

Social Media have been able to shift the way information is generated and consumed. At first, information was generated by one person and “consumed” by many people, but nowadays most part of the information available in the Web is generated by users, which has changed the needs in information access and management. Social Networks like Facebook or Twitter manage tens of PB of information, with flows of hundreds of TB per day, and hundreds of billions of relationships

User generated content provides an excellent scenario to apply the metaphor of mining any kind of information. In a social media context, users create a huge amount of data where we can look for valuable nuggets of knowledge by applying diverse search (information retrieval) and mining techniques (data mining, text mining, web mining, opinion mining). In this kind of data, we can find both structured information (ratings, tags, links) and unstructured information (text, audio, video), and we have to learn how to combine existing techniques in order to take advantage of the existing information heterogeneity while extracting useful knowledge.

More info at ACM TIST webpage