DIS2016 Restore the balance of data

Two weeks ago was the Data Innovation Summit 2016.  I was due to speak using the presentation format of ‘ignite’.  For the ones who don’t know this format, it’s a nightmare! Out of joke, it means that slides go automatically at regular intervals (15″ in my case).  You cannot stop it, you don’t control the flow… so to be synchronized, you really have to prepare your speech in advance, you must know exactly how much time it takes to explain each of your points, what examples you’ll be presenting (check it out, 15 seconds go very quickly when you’re looking for your words :-))).

So here it is, my 5′ presentation, if you only count the time on scene…

Big Data workshop at the First European Celebration of Women in Computing

ECWCThis last Tuesday, I lead the ‘Discover Big Data’ workshop at the First European Celebration of Women in Computing.  There were many parallel sessions that morning and I received some questions about my presentation from the participants that couldn’t divide themselves to attend this workshop 😉

Welcome to the Big Data workshop, we need women in Big Data!

This workshop is called ‘Discover Big Data’ because Big Data is a hyped word. It is being used for anything where data is involved, but it still remains confusing as what it means.

  • You are also in Big Data  if you are dealing with data that has to be processed at great velocity, as is the case for GPS or for mobile phones.
  • You are in Big Data if you cross information that come on a variety of formats, like your customer’s transactions and your customer’s emails, or if you go to the social networks, like Facebook or  Twitter.  You can discover what are the topics being discussed, what is being said about your company or  who is talking about your product.
  • You are in Big Data  if you’re exploiting one of the many big available datasets like weather information, official administration records like property records or  financial information, economic indicators…

What can be done with Big Data?

It is mainly used for customer intimacy, discovering your customer profiles and target them on a one to one base. Finding their preferences and the hidden patterns to predict customer churn.

It can be used for optimisation, finding patterns of systematic problems hidden in your historical data. It can help for organising your maintenance, or to improve the supply-chain, finding better logistic solutions, optimise processes.

It is also used for innovation: It can help you create your new product. Looking at your competitors and finding the white-spaces or uncovering market trends.

And more generally, with all the available data you can create models forecasting future events and behaviors. Through what-if analysis to predict the outcomes of potential changes, you can direct your business strategy. It helps anticipating previously unforeseen opportunities, as well as avoiding costly situations, finding new revenue opportunities or identifying more effective business models.

As you see, there are great business opportunities!

How can we do all that?

There are many techniques like statistical analysis, data mining, text analysis, sentiment analysis, graph analysis, machine learning, predictive analysis, neural networks, conceptual clustering…

You may have heard already some of those words that sound promising but that also sound very complicated. And even so, the Big Data field is growing exponentially as men are running for it.  There are only 10% of women, don’t you want to be part of it? Companies that took this wave are thriving, well ahead of classical business. They are proposing you the right product at the right time, with the features you are looking for, for the price you are willing to pay. They are  increasing their profits while shaping our future with the products and business strategies they are creating.

I hear you saying: This is great but I don’t know a thing about this and it sounds so complicated. I’m here to tell you that not all of it is that difficult.

YOU could be in Big Data.

If you are in computing you have a leg up. And if you like mathematics you’ll enjoy being a data scientist. But you could be in Big Data even if you are not a techy person.  If you are in HR, in marketing, if you are a manager or a decision-maker with the right mindset open to data, you can exploit the Big Data wave.

Even if you see the potential, women tend to think ‘it’s not for me, I don’t have the competencies’.

Let me use some feminine stereotypes to illustrate we have the basic skills:

  • We have a tradition of getting together and talking too much.  And we have a tendency to be matchmakers.  We can put those skills of information gathering and making connections to good use finding relationships between data.
  • Who recognises herself in this? We are control freaks and plan everything, even the time of our loved ones.  Don’t you have a TODO list for your partner on Saturdays?  I do: Love, since you are driving Alex to the scouts, could you please pass by and drop the trousers at the dry cleaner?  What if you knew what your GPS knows already, that a road is blocked?  You could have asked him to bring some bread back as he’s going to pass near the bakery.  Don’t you feel satisfaction when doing things efficiently, optimising the Saturday time? So imagine tapping into all the available information and using it to improve the processes, it’s a rewarding job.
  • And if you have artistic skills, visualisation is your field. This is a new branch of data science, they are creating new techniques very interesting to show more than 3 dimensions of data, so you can see easily relationships graphically.
  • Generally speaking, I think we women have a natural talent to be data analysts: the ‘What if’ comes natural to us, we always investigate all possibilities before deciding for one, isn’t it?

Summarising, we saw there is business in here, and that we have the basic skills to be in the Data business.

Moreover, it is important that more women move into this field, not only because of the many business opportunities, but also because there are ethical issues involved in Big Data. We can mention data privacy and price gauging as some of these issues, but there are other business models that can be controversial.

The rules of what can be done with the data and what is off-limits, are being defined right now.  Let’s not miss the opportunity to give our view on this.

As an example, there is a great initiative from the Data2X program of the UN, who’s doing a research on women’s freedom of movements through satellite images and their phone geolocation.  Are they limited in their movements in some countries, do they have access to education, to health care? Great initiative, but what about the same at a private level: is following the movement of your partner with her/his phone geolocation ethical? What about tracking the movement of your children, as it’s done already in some countries?

It’s important to have our saying in the ethical uses of all those lakes of data and be represented in the decisions that will define our future society. We, women, have a natural tendency of looking after our loved ones, taking their needs in consideration. That’s what Big Data is needing, people that set the rules for using the incredible amounts of data, taking into account the different perspectives and with a long term view in mind. It’s the moment to use our feminine voice to shape a better society for all of us, participating also in the creation of the new business models.

In this workshop you will hear success stories to show you the opportunities to be included in this field. I hope you’ll join the Big Data movement.

Pre-Crime unit for tracking Terrorists?

minority-report-11-3Due to last events in Belgium, the terrorist bomb attacks in Zaventem and Brussels, I couldn’t but remember the article from Bloomberg Businessweek talking about pre-crime: ‘China Tries Its Hand at Pre-Crime’.  They refer us to the film Minority Report, with Tom Cruise, that takes place in a future society where three mutants foresee all crime before it occurs. Plugged into a great machine, these “precogs” are at the base of a police unit (Pre-Crime unit) that arrests murderers before they commit their crimes.

China Electronics Technology company won recently the contract for constructing the ‘United information environment’ as they call it, an ‘antiterrorism’ platform as declared by the Chinese government:

The Communist Party has directed [them] to develop software to collate data on jobs, hobbies, consumption habits, and other behavior of ordinary citizens to predict terrorist acts before they occur.

This may seem a little too much to ask, if you think about it you may need every daily detail to be able to predict terrorist behaviour, but in a country like China where the state has control over their citizens since many decades, where they have no privacy limits to respect and a good network of informants…

A draft cybersecurity law unveiled in July grants the government almost unbridled access to user data in the name of national security. “If neither legal restrictions nor unfettered political debate about Big Brother surveillance is a factor for a regime, then there are many different sorts of data that could be collated and cross-referenced to help identify possible terrorists or subversives,” says Paul Pillar, a nonresident fellow at the Brookings Institution.

See how now there is also a new target: subversives.  the article continues:

China was a surveillance state long before Edward Snowden clued Americans in to the extent of domestic spying. Since the Mao era, the government has kept a secret file, called a dang’an, on almost everyone. Dang’an contain school reports, health records, work permits, personality assessments, and other information that might be considered confidential and private in other countries. The contents of the dang’an can determine whether a citizen is eligible for a promotion or can secure a coveted urban residency permit. The government revealed last year that it was also building a nationwide database that would score citizens on their trustworthiness.

Wait a second, who’s defining what is ‘trustworthiness’, and what if you’re not?

New antiterror laws that went into effect on Jan. 1 allow authorities to gain access to bank accounts, telecommunications, and a national network of surveillance cameras called Skynet. Companies including Baidu, China’s leading search engine; Tencent, operator of the popular social messaging app WeChat; and Sina, which controls the Weibo microblogging site, already cooperate with official requests for information, according to a report from the U.S. Congressional Research Service. A Baidu spokesman says the company wasn’t involved in the new antiterror initiative.

So Skynet is here now (remember Terminator Genisys?). Even if right after a horrendous crime you can be tempted to be happy that this ‘pre-crime’ initiative is being constructed, there are way too many negative aspects still to consider before having such a tool. Like in which hands will it be, who’s defining what is a crime, what about your free will of changing your mind, to mention some.  Let’s begin thinking how to tackle them.

Alex Pentland’s article on Data-Driven Society

I recently got the new issue from Scientific American (October 2013), and in the front page was announced the article ‘The Data-Driven Society’ by Alex Pentland.  I just had to read it 🙂

He co-leads the World Economic Forum on Big Data and Personal Data initiatives.  He was talking about all the digital bread crumbs we leave behind on our daily life (like gps and gsm info, or electronic payments) and what can be done with it.

With his students of the MIT Human Dynamics Laboratory, he is discovering mathematical patterns through data analytics that can predict human behaviour. ‘Bread crumbs record our behaviors as it really happens’ he says, it is more accurate than the information from social media, where we choose what we want to disclose from ourselves.  Alex and his team are in particular interested in the patterns of idea flows.

Among the most surprising findings that my students and I have discovered is that patterns of idea flow (measured by purchasing behavior, physical mobility or communications) are directly related to productivity growth and creative output.

Analysing those flows, he uncovered 2 factors that have a positive pattern of healthy idea flow:

  • engagement: connecting to others, usually in the same team or organisation, and
  • exploration: going abroad to exchange ideas.

Both are needed for creativity and innovation to flourish.  To find those factors, he based his research on graphs of different types of interactions, like person-to-person, emails, sms..

We may not have the tools he used (like an electronic badges for tracking person-to-person interactions) but intuitively this is something we know, a good communication is essential for the success of a team, but talking to an external person may provide a new insight.  It’s always good to be proved right, isn’t it?

Check my next post, I’ll continue with his article, there are a lot of great concepts he is presenting as the ‘new deal on data’ for personal data protection.


Small talk on Big Data

Last week I presented this topic to professional women at PWI here in Brussels. It’s called ‘small talk’ because it is not a technical presentation but one for a broader audience, to create awareness on this Big Data trend.   The main concept I wanted them to take away is the change in the business arena and in our society due to Big Data. If you are interested on this subject, just drop a line and let me know!

Prices of discs and storage devices have dropped a lot, so now basically any digital data is being stored.  Cost is so low, that it is worth to save it ‘just in case, and we’ll see in the future what we can do with this data’.  Technology has made also  huge advances with massive parallel processing, and we can manage to jungle through thousands of servers to analyse a bunch of diverse data and extract information from it in a usable time-frame.

This allows business strategists to make smarter decisions based on facts, better than how it was done before, based on experience or intuition.  So the message for all decision-makers is: go and check your data, you’ll find there valuable information to decide any business matter.  Also, be aware that your competition is going into it too, it can out-smart you!

At the society level, there are many ethical issues to deal with, like privacy or equality and fairness.  What to you think, is it fair to have a subsidy that is ‘personalised’, that may give more to someone than to others because of a particular factor, or allow access to a health treatment to someone and not to another based on his life expectancy for example?  What about basing the decision on his ‘ROI’  like the capability of paying back for the given  treatment?  Or is it more fair to have instead equality on subsidies, same amount for everyone? Even for the ones that could pay it by themselves? Either we discuss them before-hand, or we will be at the mercy of any politician or entrepreneur taking a step deeper in an unethical direction.

And as a last twist, I would like to point out that the basic value of knowledge is challenged.  We are already experiencing a change of values, knowledge is less and less valued as an asset anymore, but value remains in knowing how to get to the knowledge,where to find it and what to extract from data.


Popularity ranking for five main crowdsourcing categories

Eric Blattberg, in his article The five crowdsourcing categories ranked: Popularity in social media gives us an analysis of how these 5 crowdsourcing categories ranked last year: cloud labor, crowd creativity, crowdfunding, distributed knowledge and open innovation.  Here are the main results:


What do the world’s social media users think about crowdsourcing? Crowdsourcing.org partnered with KL Communications to find out.

Together we tried to get a sense of how the world is feeling about the different forms of crowdsourcing using something called sentiment analysis. You’ve probably seen stories in the news recently that try to gauge the mood of certain groups or entire social networks like Twitter using sentiment analysis.

[…]  Using a tool called Netbase, which indexes and analyzes millions of conversations across the web, Crowdsourcing.org and KLC analyzed data for the top 15 sites generating the most buzz for each of the five main crowdsourcing categories in Crowdsourcing.org’s Directory […] This report examines a year of data — timeframe: November 1, 2010 to October 31, 2011 — garnered from Facebook, Twitter, blogs, forums, news sites and consumer reviews.




Although the set of sites analyzed in the distributed knowledge category generated the most buzz, the posts that reference them are consistently the most pessimistic: over 27% of opinionated comments about distributed knowledge are negative in nature. Conversely, comments referencing open innovation and crowdfunding sites carry the highest positive sentiment of the bunch — 91.4% and 91.1% respectively — while posts about crowd creativity and cloud labor platforms fall somewhere in the middle.



As any half-decent media interpreter knows, “absolute buzz” is only one part of the equation. What’s trending, also referred to as “normalized buzz,” is equally significant. During the November 2010 to October 2011 timeframe, references to crowd creativity and cloud labor skyrocketed. Distributed knowledge, the most popular category in terms of absolute buzz, saw very slight gains over the course of the year. Crowdfunding references stayed relatively steady, while discussion of open innovation platforms plummeted throughout 2011.


Can a big leak on privacy stop Big Data?


Bill Franks is Chief Analytics Officer for Teradata’s global alliance programs.  So he knows more than a little about what’s going on in the Advanced Analytics space. He predicted on the International Institute for Analytics’ 2012 that the evolution of big data will depend on how the privacy issue would be handled.  He said:  ‘I have wondered what the “big moment” will be that causes everyone to realize how much about them is exposed and leads to a major popular revolt. Honestly, I thought the big blow up in December around Carrier IQ would be that moment.’

The [Carrier IQ] software collects usage information aimed at helping telecommunications companies and mobile device manufacturers identify hardware or network issues. […] The phone was even capturing key presses such as when you entered a password on a secure website. Naturally, this caused a huge uproar. (You can view this series of articles from CNNMoney for more detail: Part 1, Part 2, Part 3, and Part 4.)

As you see, the intention of the software may be completely valid, but any e-recording entails a risk for privacy.  It is critical to create awareness of this risk, and that the access and usage of all recorded information must be regulated.

[…] The extent to which the tracking of behavior on the internet occurs – with Facebook, Google, and other public sites capturing data about who you are, what you are doing, where you are going, and what you want – is not known to most people. Even though many privacy policies technically declare intentions to collect and use data, the dozens of pages of “legalese” terms used aren’t read or understood by most people.

[…] I believe that privacy concerns will be a major influence on how big data itself, and the use of it, evolves. There will need to be an extremely high level of trust between organizations who want our data and those of us who provide it. That trust must be earned and maintained. All it will take will be a few cases of violated trust, intentional or not, to derail the relationship and set us all back.


Though I don’t think a leak of privacy will ‘set us back’ as Bill Franks says, I do think there is a big need to create a trusted organisation, institution or other group to regulate the privacy issue.  If there was such a ‘Trusted Privacy Organisation’, there would be a way to work only with the applications that adhere to there standards and/or allow audits from such an institution.


Data.gov.uk for Christmas!

Zoe Kleinman  reported in the BBC NEWS that UK Government opens data to public.  Tim Berners-Lee, founder of the WEB, is behind this project, big mentality change for the UK governement 🙂 :

An ambitious website that will open up government data to the public will launch in beta, or pilot, form in December.

Reams of anonymous data about schools, crime and health could all be included.

Data.gov.uk has been developed by Sir Tim Berners-Lee, founder of the web, and Professor Nigel Shadbolt at the University of Southampton.

It is designed to be similar to the Obama administration’s data.gov project, run by Vivek Kundra.
Mr Kundra is Chief Information Officer in the US. The American site, while not yet comprehensive, is already up and running, with improvements fuelled by user feedback.

This is good for the public and also for the UK government, there is a return of investment:

Data.gov.uk is built with semantic web technology, which will enable the data it offers to be drawn together into links and threads as the user searches.

Let’s enjoy in December our Christmas gift, give a lot of feedback to improve the offer of the website, and encourage others to imitate the movement.

Data Philanthropy is Good for Business

Give Data as you give Blood. Global Pulse, an innovation initiative in the Executive Office of the UN Secretary-General, wants to analyse private data for the public good.  The idea is to find patterns in data coming from private companies, and share those findings.  For that, they have to find a way enterprises can deliver their users trends but warrant the anonymity of the information (to protect user’s privacy), and also in a way they don’t loose corporate competitiveness.
Check the Forbes article ‘Data Philanthropy is Good for Business’:

Corporations today are mining this data to gain a real-time understanding of their customers, identify new markets, and make investment decisions. This is the data that powers business, which the World Economic Forum has described as a new asset class.[…]

Consider: MIT researchers have found evidence that changes in mobile phone calling patterns can be used to detect flu outbreaks; A Telefónica Research team has demonstrated that calling patterns can be used to identify the socioeconomic level of a population, which in turn may be used to infer its access to housing, education, healthcare, and basic services such as water and electricity; and researchers from Sweden’s Karolinska Institute and Columbia University have used data from Digicel, Haiti’s largest cell phone provider, to determine the movement of displaced populations after the earthquake, aiding the distribution of resources.

At Global Pulse, an innovation initiative of the UN Secretary-General, we believe that analysis of patterns within big data could revolutionize the way we respond to events such as global economic shocks, disease outbreaks, and natural disasters. Our team of data scientists, open source hackers, and international development experts functions the way an R&D lab does: asking questions, formulating and testing hypotheses, building prototypes and collaborating with partners within and outside the United Nations to develop methods for harnessing real-time data to gain a real-time understanding of human well being.

We’re in discussions with corporations about how their digital services could be used as human sensor networks to detect the early warning signs that communities are losing jobs, getting sick, not getting enough food, or struggling to make ends meet. Now we need to find a way for the private sector to share, safely and anonymously, some of what it knows about its customers to help give the public sector a badly needed edge in protecting citizens. It’s the concept that has been called “data philanthropy.”[…]

The companies that engage with us, however, don’t regard this work as an act of charity. They recognize that population well being is key to the growth and continuity of business. For example, what if you were a company that invested in a promising emerging market that is now being threatened by a food crisis that could leave your customers unable to afford your products and services? And what if it turned out that expert analysis of patterns in your own data could have revealed all along that people were in trouble, while there was still time to act?

Data philanthropy could make a real difference, and it makes good business sense as well.

Rendez-vous with the EC research programme on Big Data

If you are in the area of Intelligent Information Management, check Roberto Zicari’s blog, he explains this call for projects from the European Commission in a simpler way than in the official site!

Basically, they are funding projects in these areas:

a) Reactive algorithms, infrastructures and methodologies

b) Intelligent integrated systems

c) Framework and tools for benchmarking and exploring information management diversity

d) Targeted competition framework

e) Community building networks

I am going to the Information and Networking day on Intelligent Information Management, that will be held the 26 of September in Luxembourg.

Let me know if you are joining too!