Mean, Median, & Mode

In [11]:
import numpy as np
from scipy import stats
%matplotlib inline
import matplotlib.pyplot as plt

Mean & Median

Mean is the average. Median is the middle.

Generate Fake Data

some data

  • centered around 27,000
  • with a normal distribution and standard deviation of 15,000
  • with 10,000 data points
In [12]:
dataCenter = 27000
normalDistribution = 15000
numberOfPoints = 10000

incomeData = np.random.normal(dataCenter, normalDistribution, numberOfPoints)
In [13]:
np.mean(incomeData)
Out [13]:
26825.809929744333

Visualize

We can segment the income data into 50 buckets, and plot it as a histogram

In [14]:
bucketCount = 50
plt.hist(incomeData, bucketCount)
plt.show()
output png
In [15]:
np.median(incomeData)
Out [15]:
26784.49799065586
In [16]:
massiveIncome = 1_000_000_000 # 1 billion
incomeData = np.append(incomeData, [massiveIncome])

The median won't change much, but the mean does:

In [17]:
np.median(incomeData)
Out [17]:
26784.538863280966
In [18]:
np.mean(incomeData)
Out [18]:
126813.12861688265

Impacts

- median doesnt meaningfully change

  • mean changes significantly
In [19]:
minAge = 18
maxAge = 90
howManyDataPoints = 500
ageData = np.random.randint(minAge, high=maxAge, size=howManyDataPoints)
ageData
Out [19]:
array([82, 71, 18, 37, 48, 80, 32, 54, 20, 28, 38, 74, 30, 87, 82, 63, 53,
       28, 66, 75, 25, 48, 68, 48, 32, 79, 56, 57, 47, 50, 89, 47, 45, 71,
       47, 38, 84, 44, 61, 78, 35, 42, 18, 78, 85, 82, 64, 43, 57, 31, 53,
       37, 26, 89, 60, 75, 29, 81, 63, 37, 71, 89, 71, 20, 83, 20, 88, 70,
       28, 49, 51, 80, 89, 56, 84, 63, 87, 84, 78, 89, 77, 82, 61, 30, 69,
       78, 74, 21, 55, 66, 70, 43, 88, 64, 27, 68, 68, 87, 20, 20, 52, 60,
       58, 22, 24, 26, 27, 36, 25, 88, 78, 84, 80, 73, 70, 31, 20, 85, 39,
       23, 36, 33, 50, 34, 27, 69, 19, 24, 69, 44, 59, 40, 25, 88, 21, 86,
       44, 59, 63, 78, 74, 38, 76, 41, 57, 86, 33, 57, 26, 79, 68, 46, 71,
       87, 40, 81, 79, 67, 66, 74, 76, 61, 39, 54, 27, 58, 41, 76, 53, 33,
       50, 71, 49, 39, 87, 85, 84, 79, 40, 45, 72, 89, 37, 70, 61, 30, 60,
       63, 38, 69, 52, 40, 81, 22, 59, 66, 74, 47, 68, 85, 41, 28, 49, 44,
       60, 62, 78, 53, 88, 89, 74, 80, 22, 79, 33, 18, 69, 69, 64, 48, 74,
       77, 52, 70, 80, 29, 77, 74, 37, 40, 71, 89, 58, 44, 40, 73, 30, 65,
       37, 55, 82, 44, 28, 62, 71, 36, 83, 49, 69, 37, 47, 61, 60, 80, 23,
       67, 83, 80, 63, 65, 45, 25, 37, 76, 34, 79, 21, 48, 86, 32, 63, 27,
       86, 67, 57, 42, 68, 57, 39, 49, 82, 47, 40, 47, 32, 28, 68, 89, 48,
       71, 47, 46, 19, 24, 75, 78, 65, 25, 82, 30, 67, 71, 30, 54, 66, 60,
       77, 35, 72, 69, 37, 58, 79, 33, 53, 70, 60, 23, 35, 61, 71, 81, 53,
       48, 52, 37, 86, 59, 35, 33, 58, 35, 84, 49, 87, 66, 76, 88, 89, 34,
       68, 25, 47, 42, 79, 18, 65, 88, 20, 32, 58, 21, 20, 85, 36, 71, 47,
       69, 27, 86, 60, 74, 21, 38, 78, 47, 19, 79, 71, 50, 35, 36, 79, 86,
       58, 51, 56, 49, 30, 23, 81, 59, 49, 41, 22, 87, 64, 62, 59, 34, 35,
       40, 35, 47, 32, 73, 63, 26, 53, 25, 34, 40, 65, 50, 35, 37, 62, 33,
       54, 71, 47, 85, 75, 28, 75, 63, 20, 30, 41, 23, 85, 61, 54, 83, 43,
       83, 33, 76, 58, 41, 64, 60, 27, 64, 61, 47, 19, 53, 24, 48, 88, 49,
       37, 75, 24, 76, 60, 25, 78, 85, 28, 37, 87, 50, 45, 41, 26, 58, 22,
       60, 38, 25, 67, 41, 74, 46, 76, 65, 73, 30, 88, 26, 46, 31, 87, 62,
       37, 66, 59, 55, 21, 46, 56, 60, 45, 78, 22, 72, 52, 52, 83, 74, 67,
       77, 60, 72, 85, 21, 79, 67])
In [20]:
stats.mode(ageData, keepdims=True)
Out [20]:
ModeResult(mode=array([37]), count=array([14]))
Page Tags:
python
data-science
jupyter
learning
numpy