Multiple Regression
- Imports
- Load Some Data
- Wrangle & Preview
- Using The "Model" For new input# Multiple Regression
- more than one variable influences the dependent variable
In [6]:
import pandas as pd
%matplotlib inline
import numpy as np
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
In [7]:
df = pd.read_excel('http://cdn.sundog-soft.com/Udemy/DataScience/cars.xls')
df.head()
Out [7]:
In [15]:
df1=df[['Mileage','Price']]
# 10K-mile chunks, up-to 50K miles
bins = np.arange(0,50000,10000)
avgPricePerGroup = df1.groupby(pd.cut(df1['Mileage'],bins)).mean()
print(avgPricePerGroup.head())
avgPricePerGroup['Price'].plot.line()
Out [15]:
In [18]:
scale = StandardScaler()
# extract 3 features to compare
X = df[['Mileage', 'Cylinder', 'Doors']]
# set the dependent variable
y = df['Price']
#
# SCALE the feature's values
#
X[['Mileage', 'Cylinder', 'Doors']] = scale.fit_transform(X[['Mileage', 'Cylinder', 'Doors']].values)
# X.head()
# Add a constant column to our model so we can have a Y-intercept
X = sm.add_constant(X)
print (X)
est = sm.OLS(y, X).fit()
print(est.summary())
The table of coefficients above gives us the values to plug into an equation of form: B0 + B1 * Mileage + B2 * cylinders + B3 * doors
- cylinders have a coefficient of over 5K
- mileage have a negative coefficient of 1,200
- door-count has a negative coefficient of 1,400
In [19]:
#
# Another SIMPLER look at the average-price-by-door-count!
#
y.groupby(df.Doors).mean()
Out [19]:
Using The "Model" For new input
How would you use this to make an actual prediction? Start by scaling your multiple feature variables into the same scale used to train the model, then just call est.predict() on the scaled features:In [28]:
newCar = { "miles": 45000, "cyl": 8, "doors": 4 }
scaled = scale.transform([[newCar['miles'], newCar['cyl'], newCar['doors']]])
scaled = np.insert(scaled[0], 0, 1) #Need to add that constant column in again.
print(f'scaled:{scaled}')
predicted = est.predict(scaled)
print(f'predicted price: {predicted[0]}')
Page Tags:
python
data-science
jupyter
learning
numpy