How Do You Fight Spam With Bayes’ Rule?

It might surprise you to know that in 2013, 70.7% of all worldwide emails were spam. Spam emails are unsolicited email that are sent out in bulk. To combat these emails, companies utilize spam filters provided by software companies to block the spam emails from reaching the desired recipient.

One provider, SpamTitan, advertises the following data

  • It blocks 99.9% of all spam email.
  • It blocks 0.03% of all emails that are not spam.

Based on the information above, what is the probability that a delivered email is spam?

To start a problem like this, let’s identify the relevant events.

  • S is the event that an email is spam
  • ′ is the event that an email is not spam
  • B is the event the an email is blocked
  • B′ is the event an email is not blocked

Based on these events, we want to compute the probability that an email is spam given that it is not blocked, P(S|B′).

Let’s look at a tree diagram of the situation.

 

Next, we’ll label the given information on the diagram.

The key here is to recognize that the data provided by the software company are conditional probabilities. Since we know that the probabilities on branches from a single point must add to 1, we can finish labeling the diagram.

The diagram is labeled nicely, but none of the probabilities match P(S|B′). The conditional probabilities on the second set of branches are all given the event S or the event ′. To find P(S|B′), we’ll utilize Bayes’ Rule. Start with the relationship between conditional probabilities,

and solve for P(S|B′). This gives

All of the probabilities on the right side may be found from the tree diagram.

The probabilities in the numerator are located along the branch in red through S and B′.

The probability in the denominator corresponds to all branches in green that lead to B′. Since the events along each branch are disjoint, the probabilities for each branch add. This gives us

So the likelihood that an unblocked email is spam is 0.24%.

Users are typically very tolerant of getting spam that has made it through a spam filter. However, they are not very tolerant of blocked emails turning out to not be spam. This probability is P(S′|B). We can compute this probability in a similar manner:

This likelihood equates to 0.012%. This should make customers very happy since it means that there important emails will rarely be blocked by the spam filter.