Mean, Median, & Mode
- Mean & Median
- Generate Fake Data
- Calculate Mean
- Visualize
- Calculate Median
- See Impact of an outlier
- Impacts
- Calculate Mode# Mean, Median, and Mode with NumPy
In [11]:
import numpy as np
from scipy import stats
%matplotlib inline
import matplotlib.pyplot as plt
Mean & Median
Mean is the average. Median is the middle.Generate Fake Data
some data- centered around 27,000
- with a normal distribution and standard deviation of 15,000
- with 10,000 data points
In [12]:
dataCenter = 27000
normalDistribution = 15000
numberOfPoints = 10000
incomeData = np.random.normal(dataCenter, normalDistribution, numberOfPoints)
In [13]:
np.mean(incomeData)
Out [13]:
Visualize
We can segment the income data into 50 buckets, and plot it as a histogramIn [14]:
bucketCount = 50
plt.hist(incomeData, bucketCount)
plt.show()
In [15]:
np.median(incomeData)
Out [15]:
In [16]:
massiveIncome = 1_000_000_000 # 1 billion
incomeData = np.append(incomeData, [massiveIncome])
The median won't change much, but the mean does:
In [17]:
np.median(incomeData)
Out [17]:
In [18]:
np.mean(incomeData)
Out [18]:
In [19]:
minAge = 18
maxAge = 90
howManyDataPoints = 500
ageData = np.random.randint(minAge, high=maxAge, size=howManyDataPoints)
ageData
Out [19]:
In [20]:
stats.mode(ageData, keepdims=True)
Out [20]:
Page Tags:
python
data-science
jupyter
learning
numpy