Here’s why: IQ tests, originally created in 1904 to work out which kids needed extra help, quickly morphed into weapons — weapons that would determine not that you needed help, but that you were helpless. Weapons that, within 30 years, were being used to justify forced sterilization and, within 40 years, genocide.
From a Radiolab episode on the topic: “The Nazi version of eugenics started in the mid-1930s, when they began forcibly sterilizing and then executing thousands of people that they'd classified as mentally ill, disabled, or what they called feeble-minded. Meaning that they'd scored low on an IQ test. This program was called T4, and it was a precursor to, and essentially a training ground for the mass executions of Jewish people that over the next several years would become the Holocaust.”
You cannot blame the Holocaust on the tests, of course. But the tests themselves were problematic even in more benign circumstances, such as the one that led to the California law I started this piece with.
Here’s how it worked: The tests were created. Then the tests got given to a whole bunch of kids. Then the scores got averaged. Then those average scores became the average scores future kids got compared against.
But here’s the problem: The “whole bunch of kids” were only white kids. And so the questions naturally reflected the white experience. From the Radiolab again: “One of the items about General Information is, who discovered America? And the only two possible answers here are Columbus and Leif Erikson. And of course, there are some people who would have a little disagreement with that.”
As my friend Johanna says, they weren’t measuring intelligence, they were measuring circumstance. And as a result, kids who grew up in different circumstances understandably did worse on the test. But because “the system” recognized the test as a measure of intelligence, those kids weren’t recognized as simply having grown up in different circumstances. They were labeled inferior. Stupid. Not worth trying to educate.
The training data was biased.
Fast-forward to today, and it would seem we’ve learned nothing.
We’re still making algorithms that take data gathered in one circumstance and try to apply it to another.
I wrote about this last month, about the way our implicit biases distort what we produce.
Last week, Wired’s Dennys Antonialli shared the results of a study he and his colleagues did. They took tweets from both drag queens and members of the far right, and used an AI tool called Perspective to determine the level of toxicity in these tweets. “Perspective defines ‘toxic’ as ‘a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.’”
Turns out, Perspective has never been trained on drag queen data. A lot of the words used by the queens turned out to have a high toxicity level associated with them. For example, a tweet from Mayhem Miller that read, “I AM BLACK. I AM GAY. I AM A MAN. I AM A DRAG QUEEN. If those are not enough for you...kindly, FUCK OFF!!!” got a toxicity level of 95.98%. By contrast, a tweet from Stefan Molyneux that read, “The three major races have different brain volumes and different average IQs,” got a toxicity level of just 21.7%.
It’s not surprising the drag queens’ scores were so abysmal. Individual words like gay (76.10%), lesbian (60.79%), queer (51.035), and transvestite (44.48%) were ranked highly toxic, as were fag (91.94%), sissy (83.20%), and bitch (98.18%).
We are baking our biases into the systems that govern the world. We are embedding our injustices so that they can be amplified and perpetuated. We don’t need our algorithms to solve problems the way we’ve solved them in the past. We need them to be aspirational for the future.
It might take a little more effort. But the alternative is unbearable.