The era of blind faith in big data must end | Cathy O’Neil


Algorithms are everywhere. They sort and separate
the winners from the losers. The winners get the job or a good credit card offer. The losers don’t even get an interview or they pay more for insurance. We’re being scored with secret formulas
that we don’t understand that often don’t have systems of appeal. That begs the question: What if the algorithms are wrong? To build an algorithm you need two things: you need data, what happened in the past, and a definition of success, the thing you’re looking for
and often hoping for. You train an algorithm
by looking, figuring out. The algorithm figures out
what is associated with success. What situation leads to success? Actually, everyone uses algorithms. They just don’t formalize them
in written code. Let me give you an example. I use an algorithm every day
to make a meal for my family. The data I use is the ingredients in my kitchen, the time I have, the ambition I have, and I curate that data. I don’t count those little packages
of ramen noodles as food. (Laughter) My definition of success is: a meal is successful
if my kids eat vegetables. It’s very different
from if my youngest son were in charge. He’d say success is if
he gets to eat lots of Nutella. But I get to choose success. I am in charge. My opinion matters. That’s the first rule of algorithms. Algorithms are opinions embedded in code. It’s really different from what you think
most people think of algorithms. They think algorithms are objective
and true and scientific. That’s a marketing trick. It’s also a marketing trick to intimidate you with algorithms, to make you trust and fear algorithms because you trust and fear mathematics. A lot can go wrong when we put
blind faith in big data. This is Kiri Soares.
She’s a high school principal in Brooklyn. In 2011, she told me
her teachers were being scored with a complex, secret algorithm called the “value-added model.” I told her, “Well, figure out
what the formula is, show it to me. I’m going to explain it to you.” She said, “Well, I tried
to get the formula, but my Department of Education contact
told me it was math and I wouldn’t understand it.” It gets worse. The New York Post filed
a Freedom of Information Act request, got all the teachers’ names
and all their scores and they published them
as an act of teacher-shaming. When I tried to get the formulas,
the source code, through the same means, I was told I couldn’t. I was denied. I later found out that nobody in New York City
had access to that formula. No one understood it. Then someone really smart
got involved, Gary Rubinstein. He found 665 teachers
from that New York Post data that actually had two scores. That could happen if they were teaching seventh grade math and eighth grade math. He decided to plot them. Each dot represents a teacher. (Laughter) What is that? (Laughter) That should never have been used
for individual assessment. It’s almost a random number generator. (Applause) But it was. This is Sarah Wysocki. She got fired, along
with 205 other teachers, from the Washington, DC school district, even though she had great
recommendations from her principal and the parents of her kids. I know what a lot
of you guys are thinking, especially the data scientists,
the AI experts here. You’re thinking, “Well, I would never make
an algorithm that inconsistent.” But algorithms can go wrong, even have deeply destructive effects
with good intentions. And whereas an airplane
that’s designed badly crashes to the earth and everyone sees it, an algorithm designed badly can go on for a long time,
silently wreaking havoc. This is Roger Ailes. (Laughter) He founded Fox News in 1996. More than 20 women complained
about sexual harassment. They said they weren’t allowed
to succeed at Fox News. He was ousted last year,
but we’ve seen recently that the problems have persisted. That begs the question: What should Fox News do
to turn over another leaf? Well, what if they replaced
their hiring process with a machine-learning algorithm? That sounds good, right? Think about it. The data, what would the data be? A reasonable choice would be the last
21 years of applications to Fox News. Reasonable. What about the definition of success? Reasonable choice would be, well, who is successful at Fox News? I guess someone who, say,
stayed there for four years and was promoted at least once. Sounds reasonable. And then the algorithm would be trained. It would be trained to look for people
to learn what led to success, what kind of applications
historically led to success by that definition. Now think about what would happen if we applied that
to a current pool of applicants. It would filter out women because they do not look like people
who were successful in the past. Algorithms don’t make things fair if you just blithely,
blindly apply algorithms. They don’t make things fair. They repeat our past practices, our patterns. They automate the status quo. That would be great
if we had a perfect world, but we don’t. And I’ll add that most companies
don’t have embarrassing lawsuits, but the data scientists in those companies are told to follow the data, to focus on accuracy. Think about what that means. Because we all have bias,
it means they could be codifying sexism or any other kind of bigotry. Thought experiment, because I like them: an entirely segregated society — racially segregated, all towns,
all neighborhoods and where we send the police
only to the minority neighborhoods to look for crime. The arrest data would be very biased. What if, on top of that,
we found the data scientists and paid the data scientists to predict
where the next crime would occur? Minority neighborhood. Or to predict who the next
criminal would be? A minority. The data scientists would brag
about how great and how accurate their model would be, and they’d be right. Now, reality isn’t that drastic,
but we do have severe segregations in many cities and towns, and we have plenty of evidence of biased policing
and justice system data. And we actually do predict hotspots, places where crimes will occur. And we do predict, in fact,
the individual criminality, the criminality of individuals. The news organization ProPublica
recently looked into one of those “recidivism risk” algorithms, as they’re called, being used in Florida
during sentencing by judges. Bernard, on the left, the black man,
was scored a 10 out of 10. Dylan, on the right, 3 out of 10. 10 out of 10, high risk.
3 out of 10, low risk. They were both brought in
for drug possession. They both had records, but Dylan had a felony but Bernard didn’t. This matters, because
the higher score you are, the more likely you’re being given
a longer sentence. What’s going on? Data laundering. It’s a process by which
technologists hide ugly truths inside black box algorithms and call them objective; call them meritocratic. When they’re secret,
important and destructive, I’ve coined a term for these algorithms: “weapons of math destruction.” (Laughter) (Applause) They’re everywhere,
and it’s not a mistake. These are private companies
building private algorithms for private ends. Even the ones I talked about
for teachers and the public police, those were built by private companies and sold to the government institutions. They call it their “secret sauce” — that’s why they can’t tell us about it. It’s also private power. They are profiting for wielding
the authority of the inscrutable. Now you might think,
since all this stuff is private and there’s competition, maybe the free market
will solve this problem. It won’t. There’s a lot of money
to be made in unfairness. Also, we’re not economic rational agents. We all are biased. We’re all racist and bigoted
in ways that we wish we weren’t, in ways that we don’t even know. We know this, though, in aggregate, because sociologists
have consistently demonstrated this with these experiments they build, where they send a bunch
of applications to jobs out, equally qualified but some
have white-sounding names and some have black-sounding names, and it’s always disappointing,
the results — always. So we are the ones that are biased, and we are injecting those biases
into the algorithms by choosing what data to collect, like I chose not to think
about ramen noodles — I decided it was irrelevant. But by trusting the data that’s actually
picking up on past practices and by choosing the definition of success, how can we expect the algorithms
to emerge unscathed? We can’t. We have to check them. We have to check them for fairness. The good news is,
we can check them for fairness. Algorithms can be interrogated, and they will tell us
the truth every time. And we can fix them.
We can make them better. I call this an algorithmic audit, and I’ll walk you through it. First, data integrity check. For the recidivism risk
algorithm I talked about, a data integrity check would mean
we’d have to come to terms with the fact that in the US, whites and blacks
smoke pot at the same rate but blacks are far more likely
to be arrested — four or five times more likely,
depending on the area. What is that bias looking like
in other crime categories, and how do we account for it? Second, we should think about
the definition of success, audit that. Remember — with the hiring
algorithm? We talked about it. Someone who stays for four years
and is promoted once? Well, that is a successful employee, but it’s also an employee
that is supported by their culture. That said, also it can be quite biased. We need to separate those two things. We should look to
the blind orchestra audition as an example. That’s where the people auditioning
are behind a sheet. What I want to think about there is the people who are listening
have decided what’s important and they’ve decided what’s not important, and they’re not getting
distracted by that. When the blind orchestra
auditions started, the number of women in orchestras
went up by a factor of five. Next, we have to consider accuracy. This is where the value-added model
for teachers would fail immediately. No algorithm is perfect, of course, so we have to consider
the errors of every algorithm. How often are there errors,
and for whom does this model fail? What is the cost of that failure? And finally, we have to consider the long-term effects of algorithms, the feedback loops that are engendering. That sounds abstract, but imagine if Facebook engineers
had considered that before they decided to show us
only things that our friends had posted. I have two more messages,
one for the data scientists out there. Data scientists: we should
not be the arbiters of truth. We should be translators
of ethical discussions that happen in larger society. (Applause) And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability
for our algorithmic overlords. (Applause) The era of blind faith
in big data must end. Thank you very much. (Applause)


100 Responses

  1. Xerxes 666

    September 8, 2017 8:50 pm

    ✨EXCELLENT.. .Video ! Thank you!!✨🌟💫✨🎉🎊🎉🎉🎊🎉🌹🌹🌹🌹🌹💗🌹🌹🌹🌹✨🌟💫✨🎊🎉🎊✨🌟💫✨

  2. future62

    September 8, 2017 10:11 pm

    I'm so burnt out on political commentary…. I agree that big data can be misapplied but spare me your politics. Trojan horse prosthyelyzing should be a crime

  3. Kody Buffett-Wilson

    September 9, 2017 12:11 am

    It's a shame that she's muddying this issue by every other topic switch threatening to go off on a feminist tangent when the concerns she raises about algorithms are extremely important and must be taken more seriously.

  4. JML 481

    September 9, 2017 7:03 am

    Blue hair, thick rimmed glasses.. obnoxious attitude, hair brained theories contending that women are oppressed by algorithms..

    Yeah, dumbass fucking feminist..

  5. NavyHuskie

    September 9, 2017 1:11 pm

    Yes, I'm sure this overweight woman with oddly coloured hair has a valid, rational point. Let's hear her out.

  6. Rob Thornton

    September 9, 2017 7:32 pm

    The underlying premise seems worthy of study, but your liberal bias in your presentation makes your audience skeptical of your analysis. Your data must have been liberal audiences and your success algorithm was based on success being the audience applauded.

  7. Martin Møller Jensen

    September 9, 2017 9:57 pm

    As a fellow data scientist, I find this whole talk stupid. It's the basics of stochastic dependencies that she pretends is a thing data scientists "forgot" to account for. It's their main job to handle these problems ffs.

  8. admagnificat

    September 10, 2017 1:33 am

    Thank you. As someone who works in a very human field where so-called "Value Added Measures" (VAM) are used to rate the vast majority of employees, I can corroborate that this practice can lead to some very, very unexpected and very, very unjust outcomes.

    I think that people are starting to realize this now, but I'm not sure how ratings will be handled as we move forward — especially when the rating systems are often encoded into state law (which means that they can be very hard to change, and can stick around long after their fairness has been called into question).

  9. Chris Luiten

    September 10, 2017 11:42 am

    one of here first claim: "blind trust in big data" should be: "blind trust in conclussions by algoritherms drawn from big data". It's less catchy, but more workable. This statement is not proven by the rest of her story. She uses 2 stories: first the teachers, were the problem was that is was not clear what data was selected and the algorithm used to grade the teachers. The second story is was about the court (fox news was an "what if"). There the concequenses were not fully clear.

    It was very clear she was not a data sientist. So her conclusion was more: "HELD TO PEOPLE RESPONSEBLE". If she was more the sientist type it would be more like: "Selfforfilling prophocy: feedbackloops and how to avoid them. Yes, you could make a full segment of wrong use of data, but keep foccussing on that. If you want a lecture on reading data(problems and issues), do that. But this intertwining will not give either subject the full conclusion it needs.

  10. Civil Savant

    September 10, 2017 4:10 pm

    … Before… facebook designers decided… to show us only what our friends post… whua… Cathy, pretty everything you said in this talk is completely wrong, and you clearly have no comprehension of data or the creation and use of algorithms, but that one thing just blew my mind. Good job on spectacularly making a fool of yourself.

  11. skullbait

    September 10, 2017 8:04 pm

    I wanted subtlety with regards to the nature of artificially generated content and decision making, but all I got was this lousy t-shirt.

  12. Arthipex

    September 10, 2017 9:54 pm

    To be honest, the point she's making is correct. Bias in input caused by humans will cause bias in output. However, doesn't an algorithm that was biased in such a way correspond more to our human nature? The solutions it might come up with might not always be the best, but they are for sure more "human" in their nature.

  13. From my point of view, you are upside down.

    September 11, 2017 1:12 am

    Funny how the most bias people always accuse everyone of being bias. Do they include themselves?

  14. TheSlimeyLimey

    September 11, 2017 4:33 am

    Data scientists should be "translators of ethical discussions that happen in larger society"
    Not sure exactly what she means by "translate" but no they should not. That sounds suspiciously like lets go find data that supports our ethics (ethics meaning ideology or agenda).

    Data scientists should be objective and impartial data analysts with the goal of being as accurate and reflective of reality as possible. If they are not objective or impartial, then yes, that is a problem that needs to be addressed. Ethics is what the non-scientists discuss when deciding what to do with the information that scientists gives them. Science "job" is to provide "larger society" with facts and understanding of reality so ethical decisions can be made based on reality not feelings or guesswork.

    A bigger problem than bad data or algorithms today is simply ignoring facts and reality because they contradict an ideology or agenda . SJWs, feminists, globalists and champions of diversity for example are allergic to facts.

  15. viper4991

    September 11, 2017 6:26 am

    I feel like she focused more on feminism and politics. And made it seem like the white male is the only biased group of people. She focused on female and minority….she focused to much on that with her point and I felt like she actuslly got a little off topic and she herself was BIASED.

  16. D F

    September 11, 2017 12:32 pm

    Stuff like this is the reason that I left the left. Yes, there are some things in here that have some factual basis but to insinuate that Math and Statistics have an evil innate bias is a little disingenuous. Also, trying to suggest that data analysis be "audited" by people like this presenter who are obviously intentionally biased is very dangerous. Even if there is bias currently (which I'm sure there is), that doesn't mean that the people on the left should hijack the results of data analysis and intentionally publish results that suit their arguments. My only argument is this. Math is perfect, data analysis is good and can always get better, and people need to keep political bias (on both sides) out of data analysis. Let the numbers speak for themselves.

  17. DapperAvocado

    September 12, 2017 12:23 am

    Man, this talk is more or less clickbait. She really starts off dramatically but I feel like this is just an 'ethics' discussion at the beginning of any intro to stat classs

  18. Tal K

    September 13, 2017 12:49 pm

    dream on, here's some simple formula that will never change:
    white + good background + education = job

    to change a working algo that already shows results is expensive cause of it complexity, and money is usually used for increasing profit, not the other way around, so you guessed it, its all sound good and just in her words but no action will be made in this matter. Think of politics, what kind of people are there and think if that can change for the better

  19. Ceut

    September 13, 2017 9:20 pm

    its the world that made me a loser, not because I am a fucking loser or anything, its the system, brb dying my hair again.

  20. No One

    September 17, 2017 5:43 am

    Data is a tool. Use it responsibly. I wished she was less political because it was a good reminder to look into methods of data analysis. "We're all biased, bigoted and racist", this is true to a degree but not really productive. The ability to make a snap judgment and operate in accordance to distribution curves was paramount to survival statistically for humans. We can argue that we have progressed as a species beyond the necessity for a healthy fear of the unknown amongst ourselves due to modern information dissemination, but genes and instincts evolve more slowly than society. Her implications are too grandous and politically obvious. Yeah capitalism has problems when producers take the path of least resistance and that didn't serve the interests of the consumers, but that means the government regulates the monopolies and externalities. I feel her idea of government action goes beyond that. Other than that, good speech.

  21. Max Gorden

    September 19, 2017 1:48 pm

    I love how a smart A.I can decide and or learn how to 'discriminate', computers think logically, that means 'discrimination' is logical.

  22. John Trauger

    September 27, 2017 11:15 pm

    This sounds like a job for a corpus collossum. No single algorithm is perfect. But three different thinking methods might work together. to create better results than any one individually.

  23. Andrew Koay

    October 24, 2017 6:44 pm

    Good work on Data Ethics! Plants thinking seeds for those whom may not know but holding on to false assumptions, used to sustain confidence in the system. Like Oneself-Check-Ownself.

  24. Mihail Tsankov

    October 28, 2017 11:59 am

    The idea that an algorithm is a black box and no one can look at it is nonsense. The same way a judge can be biased and wreck havoc on a community is the way a biased algorithm can. But an algorithm can be proven defective and changed or replaced. Much harder to do with biased people. A biased person can always find a way to do the bad thing through a loophole. An algorithm will not search for a similar way until a developer tells it to. And when it does the problem can be found and fixed.

  25. Andrea Riecken

    October 30, 2017 5:54 pm

    Ms. O'Neil work open my mind to a deeper understanding of what's happening now days. Not the laugheable game of getting ads in facebook but the serious unethical world that we are feeding while using all the tolls new era gave us. it is sooooo scary! i hope smart (and honest) people will find soon a better way of keeping human being natural being.

  26. Kashatnic K

    November 22, 2017 10:15 am

    And the problem is that she complains about over simplified metrics being an incomplete model of the world, yet her presuppositions are based on the very same over simplification, things like the wage gap are based entirely on crude algorithms comparing apples to oranges. On policing, the crime stats are simply damning once you look at them, simply the prevalence of suspects willing to shoot back at the police at rates several times more than other groups will skew the data in a way I'm sure she wouldn't accept.
    She's right in a way, but I'm sure her ilk simply wish to skew the data to paint an incomplete picture in the way they prefer.

  27. Eric Roper

    December 21, 2017 1:45 am

    We are selecting for biological robots. Social Darwinism provides a habitat that we are expected to conform to. Often this abstract habitat is itself fragile in nature.

  28. Dytallix X

    January 18, 2018 9:59 pm

    If she cares so much about her kids eating vegetables then why is she so overweight?. Bit of a hypocrite.

  29. Ashish Vinayak

    January 21, 2018 7:59 pm

    She makes some points that are already known, however she singles out specifics with a sample size of fucking 2 and then complains that data is the problem. Honestly, this video is clickbait and full of bullshit.

  30. tomas manuel flores leiva

    January 22, 2018 3:37 pm

    Así es … Intentar manipular los datos es cambiar los diseños de Dios … Tiene un costo futuro muy alto que tendrán que rendirse … Porque una ciencia sin conciencia es la ruina del alma; el prejuicio humano está gravitando y está llevando a la humanidad a aceptar grandes injusticias; el poder de la élite se ha apoderado, de las decisiones, que no llevan a ninguna parte, solo a más injusticia, fracaso y perdición … Es hora de poner en orden, la pseudociencia disfrazada de datos inmaculados porque no siempre tienen la razón … Una cosa es LSD .05 (Diferencia de límite diferente) y otra es LSD-25 (dietilamida del ácido lisérgico), para que no estés mezclando ambos criterios en el paroxismo de la ciencia total …Buen video, gracIas por compartir TED!

  31. Hugo de GARIS

    February 4, 2018 12:10 pm

    THIS FAT FOUR FEMINAZI WITH GREEN HAIR IS UNFUCKABLE, SO PROBABLY uses feminism as revenge against men for rejecting her. She is a typical "triple f" i.e. fat, four (out of ten in looks), and a feminazi, the most repulsive category of women for men, and hence the first to be rejected by men. ======To young women reading this comment, don't end up like this insect, or men with crush your ego like a bug, by totally ignoring you, forcing you to rot on the shelf, manless, loveless, sexless, babyless, and spat at.======P.S. It's interesting that most of the videos featuring this triple-f have the comments switched off. I wonder if JewTube did that, because too many men like me lashed out at her, this feminazi witch, the most hated category of female, the most rejectable by men.

  32. alan dayangco

    February 27, 2018 3:53 am

    Keeps panting. Fragment sentences. Didnt prepare the speech. Rumbling. "Big" data.
    Talk couldve been good but was delivered badly. Feminist.

  33. 吉田翔太

    April 14, 2018 3:18 pm


  34. PA N

    April 30, 2018 4:44 pm

    Big data is just a tool, like a knife and indeed that can be dangerous or amazing depend of or level consciouness and wisdom.

  35. Marko Max

    May 25, 2018 10:48 am

    Can somebody pls remove these SJW land whales from stage for educated people.
    'TEDxyz' must be founded asap, for them to slander whom/whatever they like that particular day.

  36. Shub99

    June 12, 2018 8:54 am

    Her book came out in 2016, Weapons of maths Destruction…strange that an important understanding of this importance , her talk is one year later …

  37. michael thompson

    August 11, 2018 5:37 pm

    In other words, algorithms are, can be weaponized. How do Liberals insert skin color in their algorithms?

  38. Dragon Curve Enthusiast

    September 2, 2018 9:34 am

    5:51 "Algorithms don't make things fair… They repeat our past practices, our patterns. They automate the status quo.
    That would be great if we had a perfect world, but we don't."
    The perfect summary of the talk

  39. sTL45oUw

    October 11, 2018 9:04 pm

    It figures somebody who gets the short straw would complain about getting the short straw.
    Not very objective though.

  40. V andromache

    October 12, 2018 7:59 pm

    I think data analytics is a bunch of crap!! The algorithms let these companies justify their ignorance by being lazy and not figuring it out on a case by case situation. People think if you were charged with murder that your a murderer even if you didn't do it, you weren't there, don't know this person, were on the other side of the planet. Judgement is devastating to a person because everyone should have a chance to be heard, not swept under the rug because some report told them to.

  41. solanahWP

    December 11, 2018 12:25 pm

    Well, she spoke about obscure algorithms targeting voters in 2015! (That vid is still on YouTube). Long before you-know-who was elected. So, basically, she called out Cambridge Analytica even before it was (fake or not) news.

  42. nonseans

    January 8, 2019 10:47 pm

    Her talk remainded me why I stopped listen to TED… Sorry, but that's pure BS based on overgeneralizations.

  43. Engineer 314

    February 3, 2019 6:25 am

    Great speech, except it's very easy to think O'Neil doesn't want anyone to believe in Big Data. I'd say it differently, collecting all the data is powerful, but it's also WAY too easy to tamper with the wrong way. Her claim that "algorithms are opinionated" is right and wrong in my view. The "real' algorithm is not opinionated, but those handling the data have either mismanaged confounding variables or have abused the transparency of papers in order to create an artificial algorithm. The data does lead to new revelations; the humans who handle the data can easily hide them.

  44. Elmore Gliding Club

    February 8, 2019 1:44 pm

    As a data scientist I completely agree. A credit card company (Discover) CSR just told me that they “cannot reduce my interest rate.” That is, their “system” won’t allow it. Obviously hoping, trusting, I’d acquiesce to the algorithmic overlord and let it go (I closed the account; I won’t do business with liars).

    Yes, MathBabe is so right: Harvard admissions, CNN, NBC, the Wash. Post, all promulgate these built-in biases. And we have evidence of each. Heck, CNN and NBC doesn’t even need algorithms; they can’t help themselves.

    Her message is spot-on: Do not let others’ algorithms rule your life.

    Thanks for a great TedTalk!

  45. Vinicius Claro

    March 4, 2019 11:01 pm

    Its important to differ algorithms from models.
    The models have the concept and the algorithms are part of models.
    Models include entities and rules, but algorithms follow these rules.

  46. Vinicius Claro

    March 4, 2019 11:04 pm

    Another point is the difference between weak AI and strong AI. This is important to evaluate the dimension and influence of these kind of algorithm aborded by O'Neil.

  47. Vinicius Claro

    March 4, 2019 11:06 pm

    The most important I heard from O'Neil os that algorithm "include" opinions.
    The algorithms may be analised by psicoanalists… [¿]

  48. Mikael

    April 28, 2019 6:05 am

    I have a feeling most of the disslikers didnt even watch the video. I whould have done the same if my algoritm didnt sweep over the comments before disliking and moving on. Wait.. I swear ive already wrote this exact comment on this video.. now I have to look

  49. Butch Randolph

    June 27, 2019 4:22 am

    Drug Possession Bernard had blank Dillion had Blank. Violence toward Police a factor? Arrest Records Add and history even though no convictions. Politics is the Problem. js

  50. Yun Hao Zou

    October 7, 2019 3:16 am


  51. Paul BNW

    December 22, 2019 11:14 am

    Yea no, there is no reason to suggest an algorythum would filter out women. that's an assumption. in fact women have made up most of the teachers in history. absurd to say otherwise.


Leave a Reply