Correlation and Causation in Big Data

Big data began as a term used when you have extremely large data sets, These big data sets cannot  be managed nor analyzed with conventional database programs not only because  of the size exceeding the capacities of standard data management , but also because of the variety and unstructured nature of the data (it comes from different sources as the sales department, customer contact center, social media, mobile devices and so on) and because of the velocity at which it moves (imagine what it entails for a GPS to recalculate continually the next move to stay on the best route and avoid traffic jams: looking at all traffic information coming from official instances as well as from other drivers on real time, and transmitting all the details before the car reaches a crossroad).

The term ‘Big Data’ is also used to identify the new technology needed to compute the data and reveal patterns, trends, and associations.  Furthermore, this term is now synonym of big data’s analytical power and its business potential that will help companies and organizations improve operations and make faster, more intelligent decisions.

What is big data used for?

First and the more evident part is to do statistics: how many chocolates have we sold? What are the global sales around the world, splitted per country? Where do the customers come from?

Then correlation comes to play:  things that have the same tendency, that go together or that move together: countries that are strong on chocolate sells also have  a lot of PhDs.

Thanks to

Thanks to

Correlation is not causality. It’s not because you eat chocolate that you become a PhD (nor the other way around, having a PhD doesn’t mean you are more likely of loving chocolate).  Analyzing correlations is still a big deal.  It can be a conjunction, like with thunder and lightning. It can be a causality relation, and even when there is causality, it is hard to say the direction of the relationship, what is the cause and what its effect.  Nevertheless, big data predictive behaviour analysis is doing a great job, even when the ‘why’s behind it, the underlying causes, are still hidden, not explained.

The great potential in Big data is that it helps us discover correlations, patterns and trends where we couldn’t see them before, but it’s up to us to create theories and models that can explain the relations behind the correlations.

Be Sociable, Share!

Can An Algoritm be “Racist”?

Library of Congress Classification - Reading Room

David Auerbach has written this article pointing out that some classification algorithms may be racists :

Can a computer program be racist? Imagine this scenario: A program that screens rental applicants is primed with examples of personal history, debt, and the like. The program makes its decision based on lots of signals: rental history, credit record, job, salary. Engineers “train” the program on sample data. People use the program without incident until one day, someone thinks to put through two applicants of seemingly equal merit, the only difference being race. The program rejects the black applicant and accepts the white one. The engineers are horrified, yet say the program only reflected the data it was trained on. So is their algorithm racially biased?

Yes and a classification algorithm could not only be racist but, as humans write them, or more accurately with the learning algorithms, as they are built upon human examples and counter-examples, the algorithms may have any human bias that we have.  With the abundance of data, we are training programs with examples from the real world; the resulting programming will be an image of how we act and not a reflection on how we would like to be.  Exactly as the saying on educating kids: they do as they see and not as they are told :- )

To make things worse, when dealing with learning algorithms, not even the programmer can predict the resulting classification. So knowing that there may be errors,  who is there to ensure their correctness?

What about the everyday profiling that goes on without anyone noticing? [… ]
Their goal is chiefly “microtargeting,” knowing enough about users so that ads can be customized for tiny segments like “soccer moms with two kids who like Kim Kardashian” or “aging, cynical ex-computer programmers.”

Some of these categories are dicey enough that you wouldn’t want to be a part of them. Pasquale writes that some third-party data-broker microtargeting lists include “probably bipolar,” “daughter killed in car crash,” “rape victim,” and “gullible elderly.” […]

There is no clear process for fixing these errors, making the process of “cyberhygiene” extraordinarily difficult.[…]

For example, just because someone has access to the source code of an algorithm does not always mean he or she can explain how a program works. It depends on the kind of algorithm. If you ask an engineer, “Why did your program classify Person X as a potential terrorist?” the answer could be as simple as “X had used ‘sarin’ in an email,” or it could be as complicated and nonexplanatory as, “The sum total of signals tilted X out of the ‘non-terrorist’ bucket into the ‘terrorist’ bucket, but no one signal was decisive.” It’s the latter case that is becoming more common, as machine learning and the “training” of data create classification algorithms that do not behave in wholly predictable manners.

Further on, the author mentions the dangers or this kind of programming that is not fully predictable.

Philosophy professor Samir Chopra has discussed the dangers of such opaque programs in his book A Legal Theory for Autonomous Artificial Agents, stressing that their autonomy from even their own programmers may require them to be regulated as autonomous entities.

Chopra sees these algorithms as autonomous entities.  They may be unpredictable, but till now there is no will or conscious choice to go one path instead of another.  Programs are being told to maximize a particular benefit, and how to measure that benefit is a calculated by a  human written function.  Now as time goes by, and technological advances go their way, I can easily see that the benefit function could include certain feedback the program gets from ‘real world’ that could make the behavior of the algorithm still more unpredictable than now.  At that point we can think of algorithms that can evaluate or ‘choose’ to be on the regulated side.. or not? Will it reaches the point of them having a kind of survival instinct?   Where it may lead that…we’ll know it soon enough.

photo by:
Be Sociable, Share!

The value of Reflection in Learning


I just read Stephen M. Fleming‘s article “The Power of Reflection” in the Scientific American Mind.  It talks about  the importance of metacognition, that is the ability of knowing our own thoughts and capacities.

This skill that allows us to evaluate our level of competence on a particular domain is totally independent of our effective competence in that specific domain. We can be bad at evaluating one particular skill,and still be good at it.  We can also know we don’t know anything about a specific subject but that doesn’t make us know more about it.  Though, knowing our lack of knowledge is very important! It allow us to evaluate correctly the situation and act properly accordingly. In this last mentioned case the proper action would be to look for help in that domain :-)  A very typical action we take based on our knowledge of ourselves is writing lists when we tend to forget things, I fully recognize myself here, do you?

Having a good insight on our internal thoughts and processes is very important, it can even be more important than the knowledge itself because it drives our actions. Not being aware of the reality, as they point out in the article, can be very damaging not only for us but for our social relationships and family. Not knowing that we have a particular medical condition, thus not taking the medication, can make it impossible to live unattended, even if the condition itself is not so impairing.

It plays  particular role in learning, and the article mentions a study where they tried to boost this ability among students:

[...] Thomas O. Nelson and his student John Dunlosky, then at the University of Washington, reported an intriguing effect. When volunteers were asked to reflect on how well they had learned a list of word pairs after a short delay, they were more self-aware than if asked immediately.  Many studies have since replicated this finding.  Encouraging a student to take a break before deciding how well he or she has studied for an upcoming test could aid learning in a simple but effective way.

Learners could also trigger better insight by coming up with their own subject keywords. Educational psychologist Keith Thiede of Boise State University and his colleagues found that asking students to generate a few words summarizing a particular topic led to greater metacognitive accuracy.  The students then allocated their study time better by focusing on material that was less well understood.

This method of studying should be taught at school thus teaching this meta-skill to learn more effectively.

Be Sociable, Share!

Free Search Engines, says the EU!

The European Parliament is asking to “unbundling search engines from other commercial services”, issuing a message as in the ‘Free Willy’ movie, or any other cause you may be for :-)

Free_willyThe Economist has done its first page article around it: ‘Should governments break up digital monopolies?’, Nov. 29th. 2014.  Is this issue so important?  Yes, I believe so.  The Economist’ writer dismiss this issue arguing that lately any dominant company has not kept its position for too long.  He mentions on this particular issue that technology is shifting again, and browsing is not as relevant as it was, as everybody is going mobile and using more apps than browsing than before. He also says the main interest of the EU for him is more to protect the European companies than for the benefit of the consumer, because the consumer is offered a better service with the attachment of additional functionnalities to the result of searches.

Giving people flight details, dictionary definitions or a map right away saves them time. And while advertisers often pay hefty rates for clicks, users get Google’s service for nothing—rather as plumbers and florists fork out to be listed in Yellow Pages which are given to readers gratis, and nightclubs charge men steep entry prices but let women in free.

Even though as consumers we may be happy having those additional features, I don’t fully agree:  I still believe it is very important to ensure a correct result to a search or as much as it can be, at least not too obviously biased.  And for sure I don’t want to leave in the hands of a few (managers of Google for instance) to decide what is shown to the majority of us as a result of a search, how to prone between the choices, how to direct our attention to only their friend’s interests (on products or on views).

On the other hand, we may have a bigger impact on educating the user: what is he receiving from a search result may be biased because of the business model or the intertwined interests of the search engine providing the answers. Because technology is moving very fast, for when a resolution of this type is issued, the manipulative aspect of marketing may have moved to another place.

For the other aspect, the collection of all the user’s data and its privacy, the issue is becoming urgent, the whole world would benefit from a just and feasable way to deal with it:

The good reason for worrying about the internet giants is privacy. It is right to limit the ability of Google and Facebook to use personal data: their services should, for instance, come with default settings guarding privacy, so companies gathering personal information have to ask consumers to opt in. Europe’s politicians have shown more interest in this than American ones.

Be Sociable, Share!

Changing schools with gaming techniques

Could you imagine a world where children will ask you to bring them to school?  Well, that world doesn’t seem so far away… at least I know my son would be happy to  go to the school Ian Livingstone is planning to open in 2016 in Hammersmith, London.  Read what technology reporter Dave Lee wrote on his article in the BBC News:

By bringing gaming elements into the learning process, Mr Livingstone argued, students would learn how to problem-solve rather than just how to pass exams.


Mr Livingstone said he wanted to bring the principles of his interactive books to the classroom

[...] Mr Livingstone is best known for being the man behind huge franchises such as Tomb Raider and tabletop game Warhammer.

In the 80s, his Fighting Fantasy books brought an interactive element to reading that proved extremely popular.

Speaking to the BBC about the plans, Mr Livingstone said he wanted to bring those interactive principles to schooling, but stressed the school would provide learning across all core subjects.

There is more behind his idea than just making children wanting to go to school.  It fosters a ‘hands-on’ approach that allows students not only to know, but to know how to use the learned knowledge.  Plus the added benefit of allowing diverse paths to reach the goal:

By bringing gaming elements into the learning process, Mr Livingstone argued, students would learn how to problem-solve rather than just how to pass exams.

[...] “There needs to be a shift in the pedagogy of learning in classrooms because there’s still an awful lot of testing and conformity instead of diversity.

“I’m not saying knowledge is bad – I’m just trying to get a bit more know-how into the curriculum.”

He said he considers the trial-and-error nature of creating games as a key model for learning.

“For my mind, failure is just success work-in-progress. Look at any game studio and the way they iterate. Angry Birds was Rovio’s 51st game.

“You’re allowed to fail. Games-based learning allows you to fail in a safe environment.”

Let’s wish him a great success!

Be Sociable, Share!

About Internet of Things and Privacy


Innovation is creating new materials, new sensors each time smaller, cheaper, more flexible, more powerful and at the same time less power-consuming. It allows to put them everywhere: we are surrounded with devices crowded with those sensors as our phones with cameras, gyroscopes and gps. And all those measurements captured by the sensors are being used by applications, many of which are connected to the cloud and to Internet.

Internet of Things (as this technology is called) is becoming ubiquitous, leaving us each time more exposed on our daily life.  How many of us have our whereabouts known by the GPS company, the Phone provider and even the car manufacturer?  Also our personal biometrical information is being left all over our running paths not to mention the new gym-centers.

On the other hand, Nicole Dewandre reminds us on this recorded presentation of two basic human needs: our human need of privacy and the fact that we construct ourselves through the public eye.

We need privacy to express our internal thoughts without public judgement, we need to be in a safe place to test and confront to others our lines of reasoning.  On our hyper-connected world, the spaces where we can profit from this privacy are vanishing.

As for our second need, the image the others have of us is very important. The information we leave behind influences this public image and it has a great effect not only on what others think of us, but also on our own perception of ourselves, on our self-esteem and finally it ends reflecting on our happiness.

Living on this hyper-connected world in which we are immersed is a real challenge!

Be Sociable, Share!

Our 2 ways of thinking: Fast and Slow

From Jim Holt review  in The New York Times. Illustration by David Plunkert.

From Jim Holt review in The New York Times. Illustration by David Plunkert.

I just came back from holidays, and I want to share with you my last reading: “Thinking, Fast and Slow” by David Kahneman.  He describes our mind as having 2 different ways of functioning: a fast one, based on our ‘intuition’ and a slower one, where we have to do the effort of reasoning.

  • The fast one is the intuitive way, used on everyday tasks, and is also called by the psychologists our ‘unconscious mind’.  It is based on the inputs of our senses (hearing, sight, smell..). They trigger a search in our memory and bring through associations a representation of our situation and an immediate response to it.
  • The slower functioning way is when we focus our attention on the inputs at hand, and we follow a line of reasoning based on our knowledge to come to a conclusion.  This method requires more energy, we must direct our attention to each piece of information, and as we evaluate things sequentially (one thing after the other) it is slower.

As our body is lazy by nature, this second ‘slow’ way of reasoning is only used if needed, that is if the situation requires our ‘special attention’.  It is a great thing that our faster and energy-saving functioning way is our ‘default’…except for the fact that David Kahneman points out very interesting experiences that show the pitfalls of our intuition!

One great example he presents is the ambiguity resolution that goes behind our knowledge: when a sentence or image could be interpreted in different ways, our ‘fast mind’ resolves the ambiguity with the most recent context, which is good in many situations.  The problem is that it doesn’t even let us know that there was another interpretation at all!  We are not aware that our mind took only one of the possible alternatives. And moreover, it takes the easiest available memory to give sense to the world as we sense it.  So recent events that are more vivid on our memory have a greater impact on our interpretation of the world.  This is called the ‘availability bias’.

 Not only our memories play us games, but our whole body is linked to our intuitive way of functioning.  He mentions an experiment they performed in the United States where they asked the participants to look at photos and words related with elderly, then they asked them to move to another room, and that was the aim of the experience: they measured the time it took them to walk from their actual location to the other one.  They realized that the participants that have been shown pictures related to elderly were slower than the others, like if our body was related to what we have been thinking.  This is called the ‘priming’ effect.


And what may seem more surprising, this body-mind link works also the other way around: people requested to hold a pencil on their mouth had their mood adapted to the grimace they have been forced into.  Here is the details of the experiment: some participants were requested to hold the pencil by the middle of it, so having on one side of the mouth the point and on the other the eraser, some others were requested to hold the pencil putting their lips around the eraser end.  Then the 2 groups have been presented with the same cartoon images, and the first group found it on average  more funnier that the second group.  The first group seemed on a happier mood as if they have been smiling.  The second group were less positive after they have been forced on frowning before looking at it.

The conclusion is that we have to be really careful with our  mind’s evaluation of a situation if we have left it to our unconscious or intuitive mind.  It is biased by design!  The more aware we are about those biases, the better we are to counter them.

Be Sociable, Share!

Games for breakthrough thinking

Using games for brainstorming is really great.  Instead of doing a standard meeting, the idea is to set a series of rules, and then play that game.  There is a clear beginning, once the rules have been explained and everybody agrees to play by the rules.  Then when the game is being played, the participants are free to explore the ‘game space’ that are all the possible situations that we can reach by applying the predefined rules.  And there is an end when the declared goal is reached.

image from book Gamestorming by Dave Gray, Sunny Brown and James Macanufo

image from book Gamestorming by Dave Gray, Sunny Brown and James Macanufo

Some goals are clearly defined like the ones limited by time: for example to come up in 3 minutes with as many ideas or words around a subject as possible.  Others have no time constraints; the end is to reach a desired end situation as in the 4-in-line or chess games.

But typically, in real situations where there is need of brainstorming, the goal is not so clear.  For problems that need creativity, new ideas or innovation usually the goal cannot be fully defined; it’s more like a general purpose.  We may have a general direction on where we want to go and we count on measures to see if we have succeeded.

But why playing a game for brainstorming?  Because we just love playing games :-) but more important because when we are on a game we feel free to explore all the alternatives and go beyond conventions.  And that facilitates innovative ideas to come up.  We just free ourselves from standard agreed conventions to cover all the possible alternatives that the rules of the game offer us.

As an example, I can mention the story of Timothy Ferriss, author of ‘The 4-hour workweek’ that won the gold medal at the Chinese Kickboxing National Championships.  He did not use to practice kickboxing, but he read the rules of that sport, and he explore the ‘game space’ of the Championship.  He then took advantage of 2 loopholes to participate with only 4-weeks of preparation!  One of the rules said that if the combatant fell off the platform 3 times in the row, his opponent won by default.   Another one allowed him to play in classes of lower weight than what he should have played in.  Those 2 rules combined made him the World Champion on Kickboxing.  He was not really playing; he was just pushing his opponents and won with that technique.  For sure you can argue it is not a fair way of winning, but it’s an interesting way of thinking in order to reach the goal of the game.

Another example comes from my son who had an assignment last year at the university: to program a robot so that it will follow a circuit, then it has to throw a piece of wood as far as it could and finishes by going back to its parking place.  There were points for each action: to reach the start line, to follow the path without going out of the route, to throw the piece of wood in a predetermined place and also to go back to the garage.  The path was unknown, only revealed at the time of the exam.  When the fatidic day came, the path that was presented to them was quite complicated and most of the robots failed.  But in one of the teams they had a ‘plan B’ that was a different set of programming instructions: they only programmed the robot  to do the tasks that gave points with the minimum risk: go to the starting point, go to the predetermined place and throw the piece of wood and then return to the garage. The robot didn’t even try to do the circuit, but with that strategy they were one of the 5 finalists!  Again, this is the same situation as with Timothy Ferriss: it doesn’t feel fair even if it played by the rules,  but worked for the assignement.

Now if your survival is at stake, let’s imagine a planetary catastrophe, wouldn’t it be good to have a ‘B’ strategy on your sleeve? 

Be Sociable, Share!

Design Thinking at PWN Global

Last month was the annual off-site meeting from PWN Global (Professional Women International where I’m a Board member is the Brussels chapter of this federation of networks).  Almost all the citi-networks were represented plus the Board of the federation and we had even the presence of corporate sponsors.
The main objective was to shape the lines for the future:  where do we want to go and what do we expect from the federation?

And in order to do that, Marijo Bos, our president, prepared us a session of ‘design thinking’, a game-based approach to brainstorming:

At the first step of the process we had to follow the rules to come up with as many ideas of our future as we could, to expand the universe of possibilities.  On the second step we exchange all our thoughts, and then the third step was to reduce that universe in order to keep only the shared vision, the most mentioned action proposals.

After 2 days of intensive work, we ended up with agreed objectives and a subset of well-defined actionable points.
PWN Global- Nice 20140619
We did a good job while enjoying the time together!

Be Sociable, Share!