Hi, I’m Adriene Hill, and Welcome back to

Crash Course Statistics. We ended the last episode by talking about Conditional Probabilities

which helped us find the probability of one event, given that a second event had already

happened. But now I want to give you a better idea of

why this is true and how this formula–with a few small tweaks–has revolutionized the

field of statistics. INTRO In general terms, Conditional Probability

says that the probability of an event, B, given that event A has already happened, is

the probability of A and B happening together, Divided by the probability of A happening

– that’s the general formula, but let’s give you a concrete example so we can visualize

it. Here’s a Venn Diagram of two events, An

Email containing the words “Nigerian Prince” and an Email being Spam. So I get an email that has the words “Nigerian

Prince” in it, and I want to know what the probability is that this email is Spam, given

that I already know the email contains the words “Nigerian Prince.” This is the equation. Alright, let’s take this part a little.

On the Venn Diagram, I can represent the fact that I know the words “Nigerian Prince”

already happened by only looking at the events where Nigerian Prince occurs, so just this

circle. Now inside this circle I have two areas, areas where the email is spam, and areas where it’s not. According to our formula,

the probability of spam given Nigerian Prince is the probability of spam AND Nigerian Prince

which is this region… where they overlap…divided by Probability of Nigerian Prince which is

the whole circle that we’re looking at. Now…if we want to know the proportion of

times when an email is Spam given that we already know it has the words “Nigerian

Prince”, we need to look at how much of the whole Nigerian Prince circle that the

region with both Spam and Nigerian Prince covers. And actually, some email servers use a slightly

more complex version of this example to filter spam. These filters are called Naive Bayes

filters, and thanks to them, you don’t have to worry about seeing the desperate pleas

of a surprisingly large number of Nigerian Princes. The Bayes in Naive Bayes comes from the Reverend

Thomas Bayes, a Presbyterian minister who broke up his days of prayer, with math. His

largest contribution to the field of math and statistics is a slightly expanded version

of our conditional probability formula. Bayes Theorem states that: The probability of B given A, is equal to

the Probability of A given B times the Probability of B all divided by the Probability of A You can see that this is just one step away

from our conditional probability formula. The only change is in the numerator where

P(A and B) is replaced with P(A|B)P(B). While the math of this equality is more than we’ll

go into here, you can see with some venn-diagram-algebra why this is the case. In this form, the equation is known as Bayes’

Theorem, and it has inspired a strong movement in both the statistics and science worlds. Just like with your emails, Bayes Theorem

allows us to figure out the probability that you have a piece of spam on your hands using

information that we already have, the presence of the words “Nigerian Prince”. We can also compare that probability to the

probability that you just got a perfectly valid email about Nigerian Princes. If you

just tried to guess your odds of an email being spam based on the rate of spam to non-spam

email, you’d be missing some pretty useful information–the actual words in the email! Bayesian statistics is all about UPDATING

your beliefs based on new information. When you receive an email, you don’t necessarily

think it’s spam, but once you see the word Nigerian you’re suspicious. It may just

be your Aunt Judy telling you what she saw on the news, but as soon as you see “Nigerian”

and “Prince” together, you’re pretty convinced that this is junkmail. Remember our Lady Tasting Tea example… where

a woman claimed to have superior taste buds …that allowed her to know–with one sip–whether

tea or milk was poured into a cup first? When you’re watching this lady predict whether

the tea or milk was poured first, each correct guess makes you believe her just a little

bit more. A few correct guesses may not convince you,

but each correct prediction is a little more evidence she has some weird super-tasting

tea powers. Reverend Bayes described this idea of “updating”

in a thought experiment. Say that you’re standing next to a pool

table but you’re faced away from it, so you can’t see anything on it. You then have

your friend randomly drop a ball onto the table, and this is a special, very even table,

so the ball has an equal chance of landing anywhere on it. Your mission–is to guess

how far to the right or left this ball is. You have your friend drop another ball onto

the table and report whether it’s to the left or to the right of the original ball.

The new ball is to the right of the original, so, we can update our belief about where the

ball is. If the original is more towards the left,

than most of the new balls will fall to the right of our original, just because there’s

more area there. And the further to the left it is, the higher the ratio of new rights

to lefts Since this new ball is to the right, that

means there’s a better chance that our original is more toward the left side of the table

than the right, since there would be more “room” for the new ball to land. Each ball that lands to the right of the original

is more evidence that our original is towards the left of the table. But, if we get a ball

landing on the left of our original, then we know the original is not at the very left

edge. Again, Each new piece of information allows us to change our beliefs about the

location of the ball, and changing beliefs is what Bayesian statistics is all about. Outside thought experiments, Bayesian Statistics

is being used in many different ways, from comparing treatments in medical trials, to

helping robots learn language. It’s being used by cancer researchers, ecologists, and

physicists. And this method of thinking about statistics…updating

existing information with what’s come before…may be different from the logic of some of the

statistical tests that you’ve heard of–like the t-test. Those Frequentist statistics can

sometimes be more like probability done in a vacuum. Less reliant on prior knowledge. When the math of probability gets hard to

wrap your head around, we can use simulations to help see these rules in action. Simulations

take rules and create a pretend universe that follows those rules. Let’s say you’re the boss of a company,

and you receive news that one of your employees, Joe, has failed a drug test. It’s hard to

believe. You remember seeing this thing on YouTube that told you how to figure out the

probability that Joe really is on drugs given that he got a positive test. You can’t remember exactly what the formula

is…but you could always run a simulation. Simulations are nice, because we can just

tell our computer some rules, and it will randomly generate data based on those rules. For example, we can tell it the base rate

of people in our state that are on drugs, the sensitivity (how many true positives we

get) of the drug test… and specificity (how many true negatives we get). Then we ask our

computer to generate 10,000 simulated people and tell us what percent of the time people

with positive drug tests were actually on drugs. If the drug Joe tested positive for–in this

case Glitterstim–is only used by about 5% of the population, and the test for Glitterstim

has a 90% sensitivity and 95% specificity, I can plug that in and ask the computer to

simulate 10,000 people according to these rules. And when we ran this simulation, only 49.2%

of the people who tested positive were actually using Glitterstim. So I should probably give

Joe another chance…or another test. And if I did the math, I’d see that 49.2%

is pretty close since the theoretical answer is around 48.6%. Simulations can help reveal

truths about probability, even without formulas. They’re a great way to demonstrate probability

and create intuition that can stand alone or build on top of more mathematical approaches

to probability. Let’s use one to demonstrate an important

concept in probability that makes it possible to use samples of data to make inferences

about a population: the Law of Large Numbers. In fact we were secretly relying on it when

we used empirical probabilities–like how many times I got tails when flipping a coin

10 times–to estimate theoretical probabilities–like the true probability of getting tails. In its weak form, Law of Large Numbers tells

us that as our samples of data get bigger and bigger, our sample mean will be ‘arbitrarily’ close to the true population mean. Before we go into more detail, let’s see

a simulation and if you want to follow along or run it on your own – instructions are in

the description below. In this simulation we’re picking values

from a new intelligence test–from the normal distribution, that has a mean of 50 and a

standard deviation of 20. When you have a very small sample size, say 2, your sample

means are all over the place. You can see that pretty much anything goes,

we see means between 5 and 95. And this makes sense, when we only have two data points in

our sample, it’s not that unlikely that we get two really small numbers, or two pretty

big numbers, which is why we see both low and high sample means.

Though we can tell that a lot of the means are around the true mean of 50 because the

histogram is the tallest at values around 50. But once we increase the sample size, even

to just 100 values, you can see that the sample means are mostly around the real mean of 50.

In fact all of the sample means are within 10 units of the true population mean. And when we go up to 1000, just about every

sample mean is very very close to the true mean. And when you run this simulation over

and over, you’ll see pretty similar results. The neat thing is that the Law of Large numbers

applies to almost any distribution as long as the distribution doesn’t have an infinite

variance. Take the uniform distribution which looks

like a rectangle. Imagine a 100-sided die, every single value is equally probable. Even the sample means that are selected from

a uniform distribution get closer and closer to the true mean of 50.. The law of large numbers is the evidence we

need to feel confident that the mean of the samples we analyze is a pretty good guess

for the true population mean. And the bigger our samples are, the better we think the guess

is! This property allows us to make guesses about populations, based on samples. It also explains why casinos make money in

the long run over hundreds of thousands of payouts and losses, even if the experience

of each person varies a lot. The casino looks at a huge sample–every single bet and payout–whereas

your sample as an individual is smaller, and therefore less likely to be representative. Each of these concepts can help us another

way …another way to look at the data around us. The Bayesian framework shows us that every

event or data point can and should “update” your beliefs but it doesn’t mean you need

to completely change your mind. And simulations allow us to build upon these

observations when the underlying mechanics aren’t so clear. We are continuously accumulating evidence

and modifying our beliefs everyday, adding today’s events to our conception of how the

world works. And hey, maybe one day we’ll all start sincerely emailing each other about

Nigerian Princes. Then we’re gonna have to do some belief-updating. Thanks for watching. I’ll see you next time.

May 2, 2018 9:04 pm

First

May 2, 2018 9:05 pm

Oof not first

May 2, 2018 9:05 pm

Hey kid, wanna try some statistics?

May 2, 2018 9:05 pm

Im first one greetings from PERU😮

May 2, 2018 9:06 pm

OOF

May 2, 2018 9:06 pm

Will you be covering reification and simpson's paradox and other such topics in the future?

May 2, 2018 9:06 pm

No thumbnail?

May 2, 2018 9:08 pm

The title rolls off the tongue…

May 2, 2018 9:09 pm

Hey! We need a crashcourse on lord of the rings !

It is such an intricately designed plot executed with amazing imagery. But it cannot be denied that it’s a difficult one to understand. Crashcourse literature would help!

May 2, 2018 9:16 pm

Orada Saat Kaç ?

May 2, 2018 9:18 pm

What an amazing title

May 2, 2018 9:37 pm

Delighted to discover cc stat simulates in R

May 2, 2018 10:27 pm

The thumbnails not appearing for anyone else

May 2, 2018 10:29 pm

cue the youtube "rationalists" and the cult of bayes in the comments…

May 2, 2018 10:33 pm

Yay, the Reverend gets his day ! Was worried you were going to skip the topic. Thanks, nice job.

May 2, 2018 10:33 pm

I’m very impressed by stating the finite second moment condition that you need for a random variable to be in L2 and you need for many—if not all—of the laws of large numbers and the Central Limit Results.

May 2, 2018 10:37 pm

"bay-EE-shun", not "BAYS-ian". We don't pronounce "Cartesian" as "CARTS-ian".

May 2, 2018 10:55 pm

Title needs to be updated lol

May 2, 2018 10:57 pm

ong i just finished a course on cog sci and another on cog psyc, i don’t wanna hear about bayes no more

May 2, 2018 11:29 pm

/r/titlegore

May 2, 2018 11:32 pm

The true mean of a 1-100 die is 50.5

May 3, 2018 12:05 am

Ball

May 3, 2018 12:15 am

Yeah, Bayesianism! I'm a Bayesian wherever you go! I'm a Quantum Bayesian as well!

May 3, 2018 1:00 am

Nice video. I'm a linguist and currently trying to learn Bayesian statistics. Our aim is to model how people choose what to say given the situation of the utterance. It is pretty hard, and I wish I have studied Bayes in school. It would make our lives so much easier.

May 3, 2018 1:17 am

Thanks.

May 3, 2018 1:27 am

So we can use statistics to debunk conspiracy theories, very interesting. We can look at all the mass shootings in America and ask was the FBI involved and then compare that to all the shootings where the FBI was actually involved to those that occurred apparently randomly. We can then predict what the likely hood is that a mass shooting was actually not a random event. We can then use that data to sue the government.

May 3, 2018 1:35 am

this episode makes me doubt the efficacy of drug testing for employment purposes…

May 3, 2018 2:11 am

Did you see "Nigerian Prince: the movie"?

Yes, it was awesome! Oscar material

May 3, 2018 3:50 am

love it

May 3, 2018 4:40 am

If you like probability, don't read Nassim Taleb's "the black swan", just don't…

May 3, 2018 4:53 am

Tree diagram and bayes theorem

May 3, 2018 6:32 am

Go and see for your self?

May 3, 2018 7:16 am

I was waiting for Bayes

May 3, 2018 7:48 am

Frequentist statistics > Bayesian statistics. Fite me! (:

May 3, 2018 12:41 pm

this is a good episode.

May 3, 2018 2:11 pm

Geo news urdu pakistan is a very yyy

May 3, 2018 3:27 pm

Hello Mr. John Green,

My group members and I have currently finished reading Looking for Alaska. We’re working on a English project regarding your book. We would like to interview you. Can you tell us your thoughts about underage drinking and drunk driving?

May 3, 2018 3:30 pm

Where can I learn more about Bayesian Statistics?

May 3, 2018 4:56 pm

What was the probability of having no thumbnail?

May 3, 2018 5:52 pm

I've always thought that the game Battleship is a great example of Bayesian logic.

May 3, 2018 6:08 pm

In the pool table example you're using Bayes on baize.

May 3, 2018 10:08 pm

This is amazing!!! thank you so much 🙂

May 3, 2018 11:32 pm

Glitterstim!

On a related note, May the Fourth be with you!

May 4, 2018 1:36 pm

I love this video!! 🙂

May 4, 2018 6:39 pm

Nigerian Prince = 100% spam

May 4, 2018 7:25 pm

I want to try Glitterstim now, statistics is a gateway drug.

May 4, 2018 7:58 pm

I love how I always come across these video's within a few days that I learned the same topics in school

May 6, 2018 6:19 am

I love how she doesn't talk super fast like they do at some of the other courses.

May 6, 2018 3:37 pm

What!!!!!!?????? This makes no sense!!!!!!!

May 7, 2018 10:08 am

you have got here the best explanation i have seen so far in my life for P(A|B). that "venn diagram arithmetic" totally makes sense.

May 8, 2018 7:22 am

I'm not a Nigerian prince, and if you give me nothing, I will give you nothing in return… Other than perhaps treatment like you're a human… most of the time.

May 8, 2018 11:54 am

I am a Nigerian Prince struggling with a glitterstim addiction, please subscribe to me.

May 10, 2018 5:26 pm

wow just wow! I wish professors in the department taught me statistics like this.

May 11, 2018 4:39 pm

Yes! Bayesian statistics!

I was fearing that this series (like – sadly – so much of modern science) would be all about frequentist statistics. I'm so glad to be proven wrong!

May 14, 2018 5:19 am

wow, that's incredible

June 10, 2018 8:42 am

Because you know I'm all about that Bayes, 'bout that Bayes, no treble.

June 10, 2018 9:43 am

maybe Joe is one of the best employees because he's on drugs

July 7, 2018 7:11 am

This is absolutely professional and well-made. A confusing concept explained intuitively. Well done CC!

July 21, 2018 10:23 pm

really need a series that explains the

computationsas well as she does the concepts. any suggestions?July 22, 2018 10:13 am

LOVED IT

August 22, 2018 11:27 pm

…this, illustrates, even paradigms, the trouble with statistics—by assuming, statistics, is a valid metric or basis and then doing more-statistics on top of that assumption meanwhile mathematics itself consists of finding equivalences of logical and arithmetical—statistics drags a tangent like a ball-and-chain on an average without respect to cause-and-effect… (We still don't have proof that statistics spans the data completely losslessly invertibly)…August 22, 2018 11:45 pm

…sidebar—in universe A piranhas eat guppies, P(guppies&piranhas) = P(guppies|piranhas) * P(piranhas) = 0 , but that doesn't tell us P(piranhas), and worse, it'd be true if guppies eat piranhas—so we can't tell anything about universe A 'til the doctors say "look at the teeth", proving only that we learn nothing by the doctors, and everything by Red Riding Hood…September 3, 2018 5:11 pm

Nigeria doesn't even have a prince it is a presidential republic (although a rather corrupt one)

September 13, 2018 10:41 pm

Is the law of large numbers similar concept to the Central Limit Theorem?

September 23, 2018 6:54 pm

Whoa, the Black Ball on 5:50 is labeled "7". But the black ball on the table on 0:50 is labeled "8"

October 16, 2018 10:20 pm

Love the videos, but why use Nigerians tho.

October 18, 2018 4:45 am

Lol Nigerians are never live to hear the end of jokes on their money scam era

December 9, 2018 12:34 am

Just a brief note: If you run the simulation, in the last line of the code "hist(simulated_samples, xlim = c(0,100),breaks=seq(0,100,1))" add a space before "breaks" and after the comma which preceeds "breaks". Otherwise, the code will give you an error.

December 10, 2018 4:40 pm

I reaaaally dislike the fact that people look at "failing" a drug test as some kind of proof that the person is "on drugs" in the sense of being an addict or corrupt in some way.

If I am fulfilling my duties as an employee it's of no business to the company what I do to my body at home.

Fyi, I'm not a cannabis user or anything but I very rarely take psychadelic or MDMA. If a company were to take me on a surprise drug test at an unlucky time for me and I would be fired, I would consider it a grave injustice.

January 1, 2019 9:34 pm

Great to see that at least one thinking mind has come to a conclusion that spreading the knowledge about the Bayesian statistics is worth to try. Many thanks for that!

I had finished my master thesis in psychology at University of Warsaw and I am really shocked how little knowledge of statistics psychologists/ psychiatrists have (yes, non-psychologists, this topic is consisted of 70% of statistics, methodology, testing, experiments, designs etc. and only in 30% of ideas of such individuals as Freud, Horney, Maslow or Pavlov). 5 years of studying this field have given me two sad conclusions about methodology of verifying new thesis of psychology:

1. Even psychology PhDs/ professors (globally, not only in Poland) have just a tiny tiny (if any!) knowledge of basics of statistics (no one, except statistic tutors, cares about normal distribution, skewness of the frequencies, size of the sample and type of the sample when it comes to analyzing the data). As I was helping other students to deal with statistics, I was facing a really shocking attitudes towards stats from PhDs/ professors like "hey, I don't have a clue on what to do with gathered material, so let's do anything – correlation (r Pearsons test) or causation (t Student's test), whatever". That freaked me out a bit as I started to think that really so little psychology PhDs/ professors really know what they are talking about in their papers…

Moreover, as I was looking for some literature for my masters, about 50-70% of ALREADY PRINTED papers was a garbage data IMO. Like "hey, I just did a research on 50 Iranian/Polish/American (any nation is applicable) students from 1st year – 35 female and 15 male. I pushed it through the SPSS machine (pushed some buttons and some numbers occur, yay!) and the conclusion is that generally speaking FEMALE are more open to xxx than man (place whatever you want instead of "xxx" (and yet – please don't make me write every mistake made in this description because it will take me another 15 minutes to summarize it 🙂 ).

2. (Which is the result of point 1) If such honored people around the world rarely cares about the key assumptions to be fulfilled, how come a. other students would be able to learn stuff? b. how come these students will be able to verify the meaning of their data? c. therefore, how future thesis would be verified if we both get rid of any theoretical assumptions and forget about statistical knowledge? d. what comes next?

Another topic is also the machine of printing scientific papers and the silence of the experiments that did not fit to the already assumed thesis (or those which just crushed the thesis), but it is a whole new area of discussion… 🙂

January 13, 2019 4:24 pm

I think you made a mistake while calculating the probability of true positives at 7:42 .

The true probability is 48.6%.

(450 divided by 925 = 0486486… ~48.6%)

To get all the numbers you should start from the bottom of the problem and work your way up:

1) you get the drug users by multiplying the baserate (5%) with all users (=simulations =10.000) –> 500 people are drug users in real life

2) you get the non-drug users by subtracting the drug users (500) from all the users (=10.000) –> 9500 people are non-drug users in real life (=they are clean)

3) you get the true negatives by multiplying the specificity (95%) with the non-drug users (9500) –> 9025 people are non-drug users according to the test and they are non-drug users in real life (=they are clean)

4) you get the false positives by subtracting the true negatives (9025) from the non-drug users (9500) –> 475 people are drug users according to the test, even though they are not in real life

4) you get the true positives by multiplying the sensitifity (90%) with the drug users (500) –> 450 people are drug users according to the test and they are in real life

5) you get the number of positive tests (true positive (450) and false positive (475)) by adding those numbers together –> 925 people are drug users according to the test, independent if they are or aren't in real life.

6) At last you have to divide the true positives (450) by the number of positive tests (925) to get the probability of a person to be indeed a drug user if the test says so.

January 17, 2019 9:58 pm

Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Prince…Nigerian Princes…Nigerian Prince…Nigerian Princes…Nigerian.

January 21, 2019 5:59 pm

Good vid. But the drug testing example isn't going to age well when even the US looks back and realises what a grotesque violation of human rights it was. Creepy AF for those of us who live in countries with rights.

January 27, 2019 2:05 pm

Can anyone explain how the example at 4:52 is related to the formula?

February 27, 2019 5:39 pm

Hi! I'm the prince of Nigeria. As my long lost half-cousin, I have chosen you to inherit my 1 ton block of gold. Just pay $1000 shipping and I'll have it delivered by magic carpet.

March 5, 2019 6:49 pm

"As long as the distribution doesn't have infinite variance". And even then it still works… As long as the distribution doesn't have infinite expectation is the real condition.

March 10, 2019 12:49 pm

Wait, is the nigerian prince a scam? So I'll never receive all that gold?

April 26, 2019 6:24 am

.

May 12, 2019 1:59 am

This makes me think about the false dichotomy fallacy a lot, since we tend to reduce probabilities into the simplest terms, like 50%, instead of more complex and harder to visualize mathematics.

May 28, 2019 10:44 am

"The failure of drug test…by employee…administered on April 20th"

Crash Course knows what's up. 😉

June 28, 2019 1:15 am

Wanna get lit? Take a shot every time she says Nigerian prince

July 14, 2019 3:42 pm

I am a Nigerian prince and this offends me