David Auerbach has written this article pointing out that some classification algorithms may be racists :
Can a computer program be racist? Imagine this scenario: A program that screens rental applicants is primed with examples of personal history, debt, and the like. The program makes its decision based on lots of signals: rental history, credit record, job, salary. Engineers “train” the program on sample data. People use the program without incident until one day, someone thinks to put through two applicants of seemingly equal merit, the only difference being race. The program rejects the black applicant and accepts the white one. The engineers are horrified, yet say the program only reflected the data it was trained on. So is their algorithm racially biased?
Yes and a classification algorithm could not only be racist but, as humans write them, or more accurately with the learning algorithms, as they are built upon human examples and counter-examples, the algorithms may have any human bias that we have. With the abundance of data, we are training programs with examples from the real world; the resulting programming will be an image of how we act and not a reflection on how we would like to be. Exactly as the saying on educating kids: they do as they see and not as they are told :- )
To make things worse, when dealing with learning algorithms, not even the programmer can predict the resulting classification. So knowing that there may be errors, who is there to ensure their correctness?
What about the everyday profiling that goes on without anyone noticing? [… ]
Their goal is chiefly “microtargeting,” knowing enough about users so that ads can be customized for tiny segments like “soccer moms with two kids who like Kim Kardashian” or “aging, cynical ex-computer programmers.”
Some of these categories are dicey enough that you wouldn’t want to be a part of them. Pasquale writes that some third-party data-broker microtargeting lists include “probably bipolar,” “daughter killed in car crash,” “rape victim,” and “gullible elderly.” […]
There is no clear process for fixing these errors, making the process of “cyberhygiene” extraordinarily difficult.[…]
For example, just because someone has access to the source code of an algorithm does not always mean he or she can explain how a program works. It depends on the kind of algorithm. If you ask an engineer, “Why did your program classify Person X as a potential terrorist?” the answer could be as simple as “X had used ‘sarin’ in an email,” or it could be as complicated and nonexplanatory as, “The sum total of signals tilted X out of the ‘non-terrorist’ bucket into the ‘terrorist’ bucket, but no one signal was decisive.” It’s the latter case that is becoming more common, as machine learning and the “training” of data create classification algorithms that do not behave in wholly predictable manners.
Further on, the author mentions the dangers or this kind of programming that is not fully predictable.
Philosophy professor Samir Chopra has discussed the dangers of such opaque programs in his book A Legal Theory for Autonomous Artificial Agents, stressing that their autonomy from even their own programmers may require them to be regulated as autonomous entities.
Chopra sees these algorithms as autonomous entities. They may be unpredictable, but till now there is no will or conscious choice to go one path instead of another. Programs are being told to maximize a particular benefit, and how to measure that benefit is a calculated by a human written function. Now as time goes by, and technological advances go their way, I can easily see that the benefit function could include certain feedback the program gets from ‘real world’ that could make the behavior of the algorithm still more unpredictable than now. At that point we can think of algorithms that can evaluate or ‘choose’ to be on the regulated side.. or not? Will it reaches the point of them having a kind of survival instinct? Where it may lead that…we’ll know it soon enough.