Introduction

This article tries to define the deviation mathematically and answers the fundamental question of How to find the outliers. Suppose we have test results of all the 5 students in the class.

Mean
Mathematically, the Simple Average of numbers (Sum of all numbers divided by the count)
Pick any random student from the class and ask the score of the student, what would it be?
If we add a new student in the class, how likely he is going to perform?
In other words, we need to find the average score.

Standard Deviation
For overview purposes, Standard Deviation is a measure of how spread out numbers is. Its denoted by Sigma. The formula is Square Root of Variance. We understand it in detail in the next section.

Variance
The average of the squared differences from the Mean. Squared to consider the cases of actual difference in cases of negative numbers cancelling out.

Understanding the concept

STANDARD DEVIATION Scenario

Let us take sample scores of five kids are 510, 380, 280, 340 and 210.
Find out the Mean, the Variance, and the Standard Deviation.

\begin{align} Mean \space \mu & = \frac{510 + 380 + 280 + 340 + 210} {5} \newline & = \frac{1720}{5} \newline & = 344 \newline \newline Variance \space \sigma^2 & =\frac{(510-344)^2 + (380-344)^2 + (280-344)^2 + (340-344)^2 + (210-344)^2}{5} \newline & = \frac{(166)^2 + (36)^2 + (−64)^2 + (-4)^2 + (-134)^2}{5} \newline & = \frac{27556 + 1296 + 4096 + 16 + 17956}{5} \newline & = \frac{50920}{5} \newline & = 10184 \newline \newline Standard \space Deviation \space \sigma & = \sqrt{10184} \newline & = 101 \newline \end{align}

Please see the chart below. It indicates Andy and Don are outliers.

Moreover, the good thing about the Standard Deviation is that it is useful. So, using the Standard Deviation, we have a "standard" way of knowing what is normal, and what is achiever or underachiever.

WEIGHTED STANDARD DEVIATION Scenario

Let's say we need to find mean of 1, 2, 3 and 4.

\begin{align} Mean \space \mu & = \frac{1 + 2 + 3 + 4} {4} \newline & = \frac{10}{4} \newline & = 2.5 \newline \end{align}

Now let's think in terms of weight, there are four numbers, hence each of the numbers have a weight of \(\frac{1}{4}\).

\begin{align} Mean \space \mu & = \frac{1}{4} \times 1 + \frac{1}{4} \times 2 + \frac{1}{4} \times 3 +\frac{1}{4} \times 4 \newline & = \frac{10}{4} \newline & = 2.5 \newline \end{align}

Let's make weighted mean a little higher ("pulled" there by the weight of 3) by changing the weight of 3 to be 0.7 and weights of other number to be 0.1.

\begin{align} Mean \space \mu & = 0.1 \times 1 + 0.1 \times 2 + 0.7 \times 3 + 0.1 \times 4 \newline & = 0.1 + 0.2 + 2.1 + 0.4 \newline & = 2.8 \newline \end{align}

Let's say we have a broad survey with two outcomes favourable and unfavourable. We define our success as getting a favourable outcome.


Pick any random member and ask expected rating of the member, what would it be?
In other words, we need to find the Average.
We need to find the mean of both the outcomes and for distribution like this, we use probability-weighted sum.

Why Probability weighted sum?
Consider Weighted Mean above and also Check out my post Understanding Mathematical Expectations: Expected Values

We can say 40% of U and 60% as F, and we don't get any number, so we define values of Unfavourable as 0 and Favourable as 1.

\begin{align} \mu & = p_{favourable} \times x_{favourable} + p_{unfavourable} \times x_{unfavourable} \newline \mu & = 0.6 \times 1 + 0.4 \times 0 = 0.6 \end{align}

To find about a single person, we cannot expect this to work. No one tells they are 40% unfavourable and 60% favourable since they got only two choices Favourable or Unfavourable. It works for a survey of say 100 people, and we might say that 40 tends to be unfavourable and 60 favourable.

Look at Variance/ Standard Deviation as the distance of any value to mean, its calculated as the weighted sum of squared distances.

\begin{align} \sigma^2 & = p_{favourable} \times (x_{favourable} - \mu)^2 + p_{unfavourable} \times (x_{unfavourable} - \mu)^2 \\ \sigma^2 & = 0.6 \times (1 - 0.6)^2 + 0.4 \times (0 - 0.6)^2 \\ \sigma^2 & = 0.4 \times (0 - 0.6)^2 + 0.6 \times (1 - 0.6)^2 \\ & = 0.4 \times 0.36 + 0.6 \times 0.16 \\ & = 0.24 \\ \sigma & = \sqrt{0.24} \\ & = 0.49 \end{align}