Sexism spotted with Maths!


I did a talk in May this year called ‘Restore the balance of data’ at the Data Innovation Summit.  It was about sexism and other biases that are implicit in our existing electronic traces (actual and historical data) and my concern because we are using that data as baseline information to create the new prediction algorithms.

I’ve discussed this many times at home when preparing the talk.  We had vivid discussions with my husband and lovely sons over our family Sunday lunches. That’s how it didn’t surprise me that my eldest son, Alex, thought of me when reading  this article of the MIT Technology Review about sexism in our language.

The article is about a dataset of texts that researchers are using to “better understand everything from machine translation to intelligent Web searching.”  They are transforming words in the text into vectors, and then applying mathematical properties to derive meaning:

It turned out that words with similar meanings occupied similar parts of this vector space. And the relationships between words could be captured by simple vector algebra. For example, “man is to king as woman is to queen” or, using the common notation, “man : king :: woman : queen.” Other relationships quickly emerged too such as  “sister : woman :: brother : man,” and so on. These relationships are known as word embeddings.

The article is about the problem that researchers have identified on this data set, they say “: it is blatantly sexist.”  Here are some examples they provide:

But ask the database “father : doctor :: mother : x” and it will say x = nurse. And the query “man : computer programmer :: woman : x” gives x = homemaker.

Thinking about it, isn’t it obvious that if we have biases on our behavior, the writings about our world would be biased too?  And anything derived from our biased writing traces will reflect our views with all our biases too.

So we learned to extrapolate from our old behavior to predict our future behaviour… just to discover that we don’t like what we are getting out of it!  Our old behavior, amplified by the algorithm, doesn’t seem so good isn’t it? It’s clearer than ever that we don’t want to continue behaving like that in the future… Well, that’s a positive point, it’s good that this uncovers our blind spots, isn’t it?

Now the good news: it can be fixed!

The Boston team has a solution. Since a vector space is a mathematical object, it can be manipulated with standard mathematical tools.

The solution is obvious. Sexism can be thought of as a kind of warping of this vector space. Indeed, the gender bias itself is a property that the team can search for in the vector space. So fixing it is just a question of applying the opposite warp in a way that preserves the overall structure of the space.

Oh, seems so easy…for mathematicians anyway 😉  But no, even for mathematicians it is difficult to find and to measure the distortions:

That’s the theory. In practice, the tricky part is measuring the nature of this warping. The team does this by searching the vector space for word pairs that produce a similar vector to “she: he.” This reveals a huge list of gender analogies. For example, she;he::midwife:doctor; sewing:carpentry; registered_nurse:physician; whore:coward; hairdresser:barber; nude:shirtless; boobs:ass; giggling:grinning; nanny:chauffeur, and so on.

Having compiled a comprehensive list of gender biased pairs, the team used this data to work out how it is reflected in the shape of the vector space and how the space can be transformed to remove this warping. They call this process  “hard de-biasing.”

Finally, they use the transformed vector space to produce a new list of gender analogies[…]

Read the full article if you are interested on their process to de-biased.  Their conclusion, with which I completely agree is:

“One perspective on bias in word embeddings is that it merely reflects bias in society, and therefore one should attempt to debias society rather than word embeddings,” say Bolukbasi and co. “However, by reducing the bias in today’s computer systems (or at least not amplifying the bias), which is increasingly reliant on word embeddings, in a small way debiased word embeddings can hopefully contribute to reducing gender bias in society.”

That seems a worthy goal. As the Boston team concludes: “At the very least, machine learning should not be used to inadvertently amplify these biases.”


Be Sociable, Share!

Learning how to learn

Learning how to learn

I’m an eternal learner.

There are so many interesting things that time is precious, so when I came accross this MOOC I couldn’t but enroll and check it out.  Anything that helps learning stuff while reducing the needed studying time really appeals to me!

I’m talking about the online course from Coursera called Learning how to learn, by Barbara Oakley and Terrence Sejnowski, created by the University of California.  I cannot but recommend it to everyone, there are plenty of good tips to make the process of learning easier. Here are my take-aways:

  • Create the habit of doing timeboxing work, using for example the “pomodoro technique”(*) where you set intervals of 25 minutes of working time, following by 5 minutes’ break (or by a longer break after 4 consecutives working slots).  Concentrating in the process (it’s time for my 25 minutes of work)  will make it easier to avoid procrastination.
    And don’t forget to gratify yourself after a focussed  interval of time spent working (a coffee, a piece of chocolate, or wandering on your garden to enjoy a sunny day as today 😉
  • Program the toughest things first, we have more energy to tackle our resistance during the morning.
  • Add time of relaxation and physical exercise to let the studied material ‘sink in’ and get connected in your brain, it’s part of the learning process!
  • The best way to fix the studied material is not to read it over and over, but to recall the information, and space the recalling over time. My son’s favorite method is using flash cards.
  • Test yourself, do exercises in different contexts, so that you make more connections to retrieve the chunks of material.
  • Prepare today your TO DO list for tomorrow, it will have time to be absorbed and tomorrow it will not occupy one slot of your working memory.

I’m sure there are many other tips I didn’t mention that may appeal to you, if you decide to follow the course drop me a line to let me know your peaks :-)

*: The pomodoro technique has 5 fundamental stages : planning, tracking, recording, processing and visualizing. In the planning phase, tasks are prioritized by recording them in a “To Do Today” list. This enables us to estimate the effort that is required for the tasks. As pomodoros are completed, they are recorded, adding a sense of accomplishment and providing raw data for subsequent self-observation and improvements. At the end of the day, you get a concrete feedback on your estimates, if there are still tasks on the list… you are like me, too optimistic! 😉

Be Sociable, Share!

DIS2016 Restore the balance of data

Two weeks ago was the Data Innovation Summit 2016.  I was due to speak using the presentation format of ‘ignite’.  For the ones who don’t know this format, it’s a nightmare! Out of joke, it means that slides go automatically at regular intervals (15″ in my case).  You cannot stop it, you don’t control the flow… so to be synchronized, you really have to prepare your speech in advance, you must know exactly how much time it takes to explain each of your points, what examples you’ll be presenting (check it out, 15 seconds go very quickly when you’re looking for your words :-))).

So here it is, my 5′ presentation, if you only count the time on scene…

Be Sociable, Share!

Big Data workshop at the First European Celebration of Women in Computing

ECWCThis last Tuesday, I lead the ‘Discover Big Data’ workshop at the First European Celebration of Women in Computing.  There were many parallel sessions that morning and I received some questions about my presentation from the participants that couldn’t divide themselves to attend this workshop 😉

Welcome to the Big Data workshop, we need women in Big Data!

This workshop is called ‘Discover Big Data’ because Big Data is a hyped word. It is being used for anything where data is involved, but it still remains confusing as what it means.

  • You are also in Big Data  if you are dealing with data that has to be processed at great velocity, as is the case for GPS or for mobile phones.
  • You are in Big Data if you cross information that come on a variety of formats, like your customer’s transactions and your customer’s emails, or if you go to the social networks, like Facebook or  Twitter.  You can discover what are the topics being discussed, what is being said about your company or  who is talking about your product.
  • You are in Big Data  if you’re exploiting one of the many big available datasets like weather information, official administration records like property records or  financial information, economic indicators…

What can be done with Big Data?

It is mainly used for customer intimacy, discovering your customer profiles and target them on a one to one base. Finding their preferences and the hidden patterns to predict customer churn.

It can be used for optimisation, finding patterns of systematic problems hidden in your historical data. It can help for organising your maintenance, or to improve the supply-chain, finding better logistic solutions, optimise processes.

It is also used for innovation: It can help you create your new product. Looking at your competitors and finding the white-spaces or uncovering market trends.

And more generally, with all the available data you can create models forecasting future events and behaviors. Through what-if analysis to predict the outcomes of potential changes, you can direct your business strategy. It helps anticipating previously unforeseen opportunities, as well as avoiding costly situations, finding new revenue opportunities or identifying more effective business models.

As you see, there are great business opportunities!

How can we do all that?

There are many techniques like statistical analysis, data mining, text analysis, sentiment analysis, graph analysis, machine learning, predictive analysis, neural networks, conceptual clustering…

You may have heard already some of those words that sound promising but that also sound very complicated. And even so, the Big Data field is growing exponentially as men are running for it.  There are only 10% of women, don’t you want to be part of it? Companies that took this wave are thriving, well ahead of classical business. They are proposing you the right product at the right time, with the features you are looking for, for the price you are willing to pay. They are  increasing their profits while shaping our future with the products and business strategies they are creating.

I hear you saying: This is great but I don’t know a thing about this and it sounds so complicated. I’m here to tell you that not all of it is that difficult.

YOU could be in Big Data.

If you are in computing you have a leg up. And if you like mathematics you’ll enjoy being a data scientist. But you could be in Big Data even if you are not a techy person.  If you are in HR, in marketing, if you are a manager or a decision-maker with the right mindset open to data, you can exploit the Big Data wave.

Even if you see the potential, women tend to think ‘it’s not for me, I don’t have the competencies’.

Let me use some feminine stereotypes to illustrate we have the basic skills:

  • We have a tradition of getting together and talking too much.  And we have a tendency to be matchmakers.  We can put those skills of information gathering and making connections to good use finding relationships between data.
  • Who recognises herself in this? We are control freaks and plan everything, even the time of our loved ones.  Don’t you have a TODO list for your partner on Saturdays?  I do: Love, since you are driving Alex to the scouts, could you please pass by and drop the trousers at the dry cleaner?  What if you knew what your GPS knows already, that a road is blocked?  You could have asked him to bring some bread back as he’s going to pass near the bakery.  Don’t you feel satisfaction when doing things efficiently, optimising the Saturday time? So imagine tapping into all the available information and using it to improve the processes, it’s a rewarding job.
  • And if you have artistic skills, visualisation is your field. This is a new branch of data science, they are creating new techniques very interesting to show more than 3 dimensions of data, so you can see easily relationships graphically.
  • Generally speaking, I think we women have a natural talent to be data analysts: the ‘What if’ comes natural to us, we always investigate all possibilities before deciding for one, isn’t it?

Summarising, we saw there is business in here, and that we have the basic skills to be in the Data business.

Moreover, it is important that more women move into this field, not only because of the many business opportunities, but also because there are ethical issues involved in Big Data. We can mention data privacy and price gauging as some of these issues, but there are other business models that can be controversial.

The rules of what can be done with the data and what is off-limits, are being defined right now.  Let’s not miss the opportunity to give our view on this.

As an example, there is a great initiative from the Data2X program of the UN, who’s doing a research on women’s freedom of movements through satellite images and their phone geolocation.  Are they limited in their movements in some countries, do they have access to education, to health care? Great initiative, but what about the same at a private level: is following the movement of your partner with her/his phone geolocation ethical? What about tracking the movement of your children, as it’s done already in some countries?

It’s important to have our saying in the ethical uses of all those lakes of data and be represented in the decisions that will define our future society. We, women, have a natural tendency of looking after our loved ones, taking their needs in consideration. That’s what Big Data is needing, people that set the rules for using the incredible amounts of data, taking into account the different perspectives and with a long term view in mind. It’s the moment to use our feminine voice to shape a better society for all of us, participating also in the creation of the new business models.

In this workshop you will hear success stories to show you the opportunities to be included in this field. I hope you’ll join the Big Data movement.

Be Sociable, Share!

Pre-Crime unit for tracking Terrorists?

minority-report-11-3Due to last events in Belgium, the terrorist bomb attacks in Zaventem and Brussels, I couldn’t but remember the article from Bloomberg Businessweek talking about pre-crime: ‘China Tries Its Hand at Pre-Crime’.  They refer us to the film Minority Report, with Tom Cruise, that takes place in a future society where three mutants foresee all crime before it occurs. Plugged into a great machine, these “precogs” are at the base of a police unit (Pre-Crime unit) that arrests murderers before they commit their crimes.

China Electronics Technology company won recently the contract for constructing the ‘United information environment’ as they call it, an ‘antiterrorism’ platform as declared by the Chinese government:

The Communist Party has directed [them] to develop software to collate data on jobs, hobbies, consumption habits, and other behavior of ordinary citizens to predict terrorist acts before they occur.

This may seem a little too much to ask, if you think about it you may need every daily detail to be able to predict terrorist behaviour, but in a country like China where the state has control over their citizens since many decades, where they have no privacy limits to respect and a good network of informants…

A draft cybersecurity law unveiled in July grants the government almost unbridled access to user data in the name of national security. “If neither legal restrictions nor unfettered political debate about Big Brother surveillance is a factor for a regime, then there are many different sorts of data that could be collated and cross-referenced to help identify possible terrorists or subversives,” says Paul Pillar, a nonresident fellow at the Brookings Institution.

See how now there is also a new target: subversives.  the article continues:

China was a surveillance state long before Edward Snowden clued Americans in to the extent of domestic spying. Since the Mao era, the government has kept a secret file, called a dang’an, on almost everyone. Dang’an contain school reports, health records, work permits, personality assessments, and other information that might be considered confidential and private in other countries. The contents of the dang’an can determine whether a citizen is eligible for a promotion or can secure a coveted urban residency permit. The government revealed last year that it was also building a nationwide database that would score citizens on their trustworthiness.

Wait a second, who’s defining what is ‘trustworthiness’, and what if you’re not?

New antiterror laws that went into effect on Jan. 1 allow authorities to gain access to bank accounts, telecommunications, and a national network of surveillance cameras called Skynet. Companies including Baidu, China’s leading search engine; Tencent, operator of the popular social messaging app WeChat; and Sina, which controls the Weibo microblogging site, already cooperate with official requests for information, according to a report from the U.S. Congressional Research Service. A Baidu spokesman says the company wasn’t involved in the new antiterror initiative.

So Skynet is here now (remember Terminator Genisys?). Even if right after a horrendous crime you can be tempted to be happy that this ‘pre-crime’ initiative is being constructed, there are way too many negative aspects still to consider before having such a tool. Like in which hands will it be, who’s defining what is a crime, what about your free will of changing your mind, to mention some.  Let’s begin thinking how to tackle them.

Be Sociable, Share!

Great visualisation tips

I would like to share with you this article on the Harvard Business Review.  They give excellent advice to ‘make extreme numbers resonate’.  They give 3 examples to illustrate their tips:

  1. Challenge: Green Mountain sold 18 billion coffee pods in two years. How can you give people a concrete sense of just how many objects that is?
    HBR-Visual Huge numbers- R1601Z_VS_CUPS_B-1024x774


  1. Challenge: Only three in 10,000 high school basketball players ever make it to the NBA. How can you give someone a deep understanding of the rarity of that feat?
    HBR -Visual small numbers-R1601Z_VS_BASKETBALL-1024x584


  1. Challenge: Every year tens of thousands of people leave one U.S. city for another. How can you show changes on this scale when it’s so hard to keep track of complex movement? […]


HBR -Visual complexity -R1601Z_VS_MOVEMENT-1024x568

In the first example, they give tips to visualise huge numbers, the second one is for small numbers, but the the third one is really interesting, as it shows an extremely clear way to picture complexity.



Be Sociable, Share!

Good resolution for 2016: let’s improve our communications skills

Dr Travis Bradberry wrote this post in Linkedin some days ago about “Why We Struggle to Communicate”.

Communication is the real work of leadership; you simply can’t become a great leader until you are a great communicator.”

communication-importanceYes, communication is critical in leadership, inspiring people and taking into account every member of the team. For an entrepreneur, it allows you to transmit your thoughts and ideas better, improving the chance of convincing investors and make ‘it’ happen.  For intrapreneurs, it helps aligning people towards the same goal. But in the end, it is an essential skill for everyone because understanding each other is the basis for better collaboration with your professional and personal relations. 

So join me on this New Year’s resolution for 2016:  let’s improve our communications skills following the strategies to take action that the author states in his article:

Speak to groups as individuals.[…] You want to be emotionally genuine and exude the same feelings, energy, and attention you would one-on-one.[…]
Talk so people will listen. […] means you adjust your message on the fly to stay with your audience […].
Listen so people will talk. […] you must give people ample opportunity to speak their minds.[…]
Connect emotionally.[…] Show them what drives you, what you care about […].
Read body language. Your authority makes it hard for people to say what’s really on their minds.[…] Pay as much attention to what isn’t said as what is said […].
Prepare your intent.  Don’t prepare a speech; develop an understanding of what the focus of a conversation needs to be […].
Skip the jargon. […]

And the last advice:

Practice active listening. Active listening is a simple technique that ensures people feel heard, an essential component of good communication. To practice active listening:

  • Spend more time listening than you do talking.
  • Do not answer questions with questions.
  • Avoid finishing other people’s sentences.
  • Focus more on the other person than you do on yourself.
  • Focus on what people are saying right now, not on what their interests are.
  • Reframe what the other person has said to make sure you understand him or her correctly (“So you’re telling me that this budget needs further consideration, right?”)
  • Think about what you’re going to say after someone has finished speaking, not while he or she is speaking.
  • Ask plenty of questions.
  • Never interrupt.
  • Don’t take notes.

Happy 2016!

Be Sociable, Share!

The Value of Emotional Connection

HBR-Emotions MAGIDS_value_v4-small

Scott Magids, Alan Zorfas and Daniel Leemon tell us that research on motivational values is paying off:

Our research across hundreds of brands in dozens of categories shows that it’s possible to rigorously measure and strategically target the feelings that drive customers’ behavior. We call them “emotional motivators.” They provide a better gauge of customers’ future value to a firm than any other metric, including brand awareness and customer satisfaction, and can be an important new source of growth and profitability.

The article guides you through a detailed process to find out your customers’ motivators, that begins with:

Online surveys can help you quantify the relevance of individual motivators. Are your customers more driven by life in the moment or by future goals? Do they place greater value on social acceptance or on individuality? Don’t assume you know what motivates customers just because you know who they are. Young parents may be motivated by a desire to provide security for their families—or by an urge to escape and have some fun (you will probably find both types in your customer base). And don’t undermine your understanding of customers’ emotions by focusing on how people feel about your brand or how they say it makes them feel. You need to understand their underlying motivations separate from your brand.

Check here the full Harvard Business Review’s article for the full description. What is surprising is this finding:

To increase revenue and market share, many companies focus on turning dissatisfied customers into satisfied ones. However, our analysis shows that moving customers from highly satisfied to fully connected can have three times the return of moving them from unconnected to highly satisfied. And the highest returns we’ve seen have come from focusing on customers who are already fully connected to the category—from maximizing their value and attracting more of them to your brand.

It is analogous to the different strategies used on education:

  • In secondary school you have to get a minimum knowledge from all the courses you have.  It is frequent that students must focus on the ones for which they are not naturally talented.
  • In higher studies, it pays to focus on your strengths, on your best skills, and to improve them until you are really good at them.

It’s not frequent to get youngsters very motivated by the courses they don’t really like, even if they finish the year managing them enough to pass. It is no surprise that it is easier to motivate the second group, and as a result, seems reasonable that the acquired knowledge or skill may be more astonishing on the second group than on the first one. Surprising not have had this intuition and need a research to show it with data.

Be Sociable, Share!

The European Data Innovation Hub

What began as a community of like-minded people, with nice meetups around data science and get-together’s, is now taking the form of the European Data Innovation Hub.  Its mission is to be an active actor in the data innovation ecosystem and to support data professionals throughout Belgium and Europe with networking activities, events, training and meeting facilities, learning platforms, co-working space and mentorship. It will foster grassroots community initiatives and take the burden out of realising and organising them. The idea is to set the conditions where people with the right skills and organisations in the right positions can have the option to move forward.

Here are some of the activities of the Hub:

  • To organise data innovation events
  • To provide co-working space for data professionals
  • To support the education and training of the data workforce, from academic to data scientists to managers to data end-users

I’m very happy to be part of this eco-system, participating not only in the trainings in Big Data and Machine Learning, but hopefully opening as many opportunities as I can to women in this domain.

Be Sociable, Share!