Naive Bayes explained naively

To me, reading about the concept of naive Bayes is like following a very logical train of thought. Without really knowing (or caring) whether this is an accurate description, I call something like this a logic chain… and it goes something like this…


Probabilities are given as a number between 0 and 1, where 1 is absolutely certain and 0 is an impossibility.

What is the chance (or probability) that I will watch Netflix on a given weekday? About 0.75.

p(Netflix) = 0.75

What is the chance that I will eat a yoghurt on a given weekday for breakfast? About 0.15.

p(yoghurt) = 0.15

Independent Probabilities

What is the chance that I will eat a yoghurt on a given weekday and watch some Netflix on the same day?

The two things have nothing to do with each other (i.e. they are independent), so we can just multiply them together, to get the probability.

p(Netflix, yoghurt) = 0.15 * 0.75 = 0.1125

Dependent Probabilities

How about two events that are related (i.e. dependent)? The chance that I will watch Marvel’s Jessica Jones is a solid 0.2.

p(JJ) = 0.2

So what is the chance that I will watch Jessica Jones on any given weekday if I was already watching Netflix?

p(JJ | Netflix) i.e. read it like the phrase above it. “|” means that the event after is taking place.

Bayes’ Rule

To answer the above question, we make use of Bayes’ Rule. This exceptionally clever chap named Thomas Bayes came up with the following rule.

p(Netflix) * p(JJ | Netflix) = p(JJ) * p(Netflix | JJ)

If we re-arrange the above to have the question of interest on one side, we get the following:

p(JJ | Netflix) = [ p(JJ) * p(Netflix | JJ) ] / p(Netflix)

You might be surprised to find out that we can assign numbers to all p‘s on the right hand-side. Even the p(Netflix | JJ) part. Think about it for a little bit… What is the probability of watching Netflix, when we are already watching Marvel’s Jessica Jones?

Any idea how high the chance is to watch Netflix, when watching a Netflix Original Series? It is in fact 1, because you cannot watch Jessica Jones, without watching Netflix.

p(Netflix|JJ) = 1


p(JJ | Netflix) =[ 0.2 * 1 ] / 0.75 = 0.2667

So, you can see, unsurprisingly, it is more likely that I watch Jessica Jones, if I am already watching Netflix.

Why are Naive Bayes interesting?

The reason why this is interesting, is because it makes for a very robust and reliable classification test. It is true that such an approach oversimplifies a situation, but it works surprisingly well in many situations.

A classic application is the identification of spam e-mails based on mail content, but it does not stop there. I just randomly typed naive bayes applied into Google and one of the first hits was this research review paper here looking at disease prediction with the help of Naive Bayes.

I hope this little text of mine made it clearer. Please leave your comments below.

Take care 🙂

This post is inspired by the section of Naive Bayes in John Foreman’s book “Data Smart – Using Data Science to Transform Information into Insight”.

This entry was posted in Data Science. Bookmark the permalink.

1 Response to Naive Bayes explained naively

  1. Jakob says:

    This was very clean and naive explanation, thanks much!

Leave a Reply

Your email address will not be published. Required fields are marked *