While reading the "Causes Trump Statistics" chapter (number 16), there is a description of a study about cabs and hit-and-run accidents. It describes a scenario where participants are told that 85% of cabs are Green, 15% are Blue and a given observer has an 80% chance of correctly identifying the color of a given cab. Given this data, the chapter presents a scenario where a bystander identifies a cab in an accident as Blue and Kahneman goes on to explain how we fail to take the data into consideration. I really enjoyed this chapter, but won't wreck the book for you.
Instead, I want to do some math (big surprise, I know). However, I want to make it accessible to non-mathematicians (atypical for my posts).
Given the data, Kahneman tells us that the true probability that the cab was Blue is 41% though we likely bias our thinking towards the 80% probability of the identification being correct. I was on the bus and it kept bothering me, to the point that I couldn't continue reading. Eventually I figured it out (when I got to the train) and I wanted to explain how this is computed using Bayes' Law. As a primer, I wrote a post using layman's terms explaining how we use Bayes' Law. (There is some notation introduced but I hope it isn't too confusing.)
Putting Bayes' Law to Use
We need to understand what 41% even corresponds to before we can compute it. What's actually happened is that we know the event IDB has occurred -- the cab has been identified (ID) as Blue (B). What we want is the probability that the cab is Blue given we know it has been identified -- we want:Pr(B∣IDB).
Using Bayes' Law, we can write
Pr(B∣IDB)=Pr(B and IDB both occur)Pr(IDB)andPr(IDB∣B)=Pr(B and IDB both occur)Pr(B).
Pr(IDB∣B)=0.8
Pr(B)=0.15.
Pr(B and IDB both occur)=Pr(IDB∣B)⋅Pr(B)=0.8⋅0.15=0.12.
Using the extended form of Bayes' Law, since we know that the events B and G (the cab is Blue or Green) are exclusive and cover all possibilities for the cab, we can say that
Pr(IDB)=Pr(IDB∣B)⋅Pr(B)+Pr(IDB∣G)⋅Pr(G).
Pr(IDB)=0.8⋅0.15+0.2⋅0.85=0.12+0.17=0.29.
Pr(B∣IDB)=Pr(B and IDB both occur)Pr(IDB)=0.120.29=1229≈0.413793103.
We're told that a cab can be correctly identified 80% of the time hence
Pr(IDB∣B)=0.8
(i.e. the probability of correct ID as Blue given it is actually Blue). We're also told that 15% of the cabs are Blue hence
Pr(B)=0.15.
We can combine these with the second application of Bayes' Law above to show that
Pr(B and IDB both occur)=Pr(IDB∣B)⋅Pr(B)=0.8⋅0.15=0.12.
The only piece of data missing now to finish our computation is Pr(IDB).
Using the extended form of Bayes' Law, since we know that the events B and G (the cab is Blue or Green) are exclusive and cover all possibilities for the cab, we can say that
Pr(IDB)=Pr(IDB∣B)⋅Pr(B)+Pr(IDB∣G)⋅Pr(G).
Since there is only an 80% chance of correct identification, we know that Pr(IDB∣G)=0.2 (the probability of misidentifying a Green cab as Blue). We also know that 85% of the cabs are Green hence we can plug these in (along with numbers already computed) to get
Pr(IDB)=0.8⋅0.15+0.2⋅0.85=0.12+0.17=0.29.
Putting it all together we get our answer
Pr(B∣IDB)=Pr(B and IDB both occur)Pr(IDB)=0.120.29=1229≈0.413793103.
Fantastic! Now we can get back to reading...
No comments:
New comments are not allowed.