How Do You Compute Conditional Probability From Data?

It easy to confuse conditional probability with probability of an intersection of two event. They are related! The probability of an event A given that event B has occurred is

The vertical bar | means “given” and the event after it is the event that has already occurred.

Let’s look at some data to determine how to find several different probabilities including conditional probability.

Problem Mammograms are typically used to screen women for breast cancer. Like most medical tests, they are not perfect. Some women who do not have breast cancer have a positive mammogram. This means that they do not have cancer, but the test indicates that the do. Other women test negative on the mammogram, but do have breast cancer. A test of 10,000 women who had a mammogram gave the following results.

Assume that these data apply to all women. Now let’s define some events:

+: a woman has a positive mammogram

-: a woman has a negative mammogram

C: a woman has breast cancer

C’: a woman does not have breast cancer.

We will use these events to answer the questions below.

a. What is the probability a woman has breast cancer?

Solution In terms of our events, we are looking to calculate P (C ). To do this, we need to find the number of women with breast cancer and divide it by the number of women in the survey,

b. What is the probability that a woman has a positive mammogram?

Solution In terms of our events, we are looking to calculate P (+ ). To do this, we need to find the number of women with a positive mammogram and divide it by the number of women in the survey,

c, What is the probability that a woman has a negative mammogram and does not have breast cancer?

Solution Now things get a little more complicated. We are now interested in women with a negative mammogram and who do not have breast cancer. From the table, these are the women who are in the negative mammogram row and in the do not have cancer column, 9208. In terms of events, these are women in the event – and C’ (similarly  Counting those women compared to the total number of women gives

d. If a woman has a negative mammogram, what is the probability that she does not have breast cancer?

Solution In this part, we know a woman has had a negative mammogram. Of those women, we want to know what portion does not have breast cancer. Since we know something in advance, this is a conditional probability problem. We need to calculate the probability that a woman does not have cancer, given that the woman had a negative mammogram or P (C ’ | -).

To calculate this probability, we need to take into the account the fact that we know the woman had a negative mammogram. Based on the table, we know that 9217 women had a negative mammogram. Of these women, 9208 did not have cancer. This means that

Notice that we can also think of this symbolically as

This is the same formula as

but with C’ instead of A and – instead of B.

How do you apply Bayes’ Rule to medical testing?

Adenocarcinoma_coliThe probability of colorectal cancer can be given as .3%. If a person has colorectal cancer, the probability that the hemoccult test is positive is 50%. If a person does not have colorectal cancer, the probability that he still tests positive is 3%.

What is the probability that a person who tests negative does not have colorectal cancer?

To solve this problem, we’ll draw and label an appropriate tree diagram. Then we’ll apply Bayes’ Rule to the problem. Look at the information given in the problem. If

C is the event “person has colorectal cancer”

+ is the event “the hemoccult test is positive”

– is the event “the hemoccult test is negative”

we know that

P(C) = 0.003

P(+ | C) = 0.5

P(+ | ′ ) = 0.03

This suggests the following tree diagram:

med_testing_01

Knowing that the sum of the probabilities from one point on the tree should add to 1, we can finish the tree diagram as follows:

med_testing_02

The probability we are looking for is P(C ′ | -). Notice that the tree diagram has P(- | C ′ ), but not the reverse conditional probability that we are looking for. This is a sign we need to use Bayes’ Rule. Let’s find the appropriate form of Bayes’ Rule. The relationship between the conditional probabilities is

med_testing_03Solving for P(C ′ | -) gives

med_testing_04

This is Bayes’ Rule for this problem. Now we are ready to use the tree diagram. P(- | C ′ ) and P(C ′ ) are both labeled on the tree diagram. We can calculate P(-) by following the branches on the tree diagram (multiply) that lead to a negative result, and then summing up the products from these branches.

med_testing_05

Putting these values into Bayes’ Rule gives

med_testing_06

This means that is you test negative, the likelihood that you do not have colorectal cancer is 99.85%. The test is quite good at screening that you do not have the disease.