Generalized Additive Models (GAM)¶
Generalized Additive Models allow for penalized estimation of smooth terms in generalized linear models.
See Module Reference for commands and arguments.
Examples¶
The following illustrates a Gaussian and a Poisson regression where categorical variables are treated as linear terms and the effect of two explanatory variables is captured by penalized B-splines. The data is from the automobile dataset https://archive.ics.uci.edu/ml/datasets/automobile We can load a dataframe with selected columns from the unit test module.
In [1]: import statsmodels.api as sm
In [2]: from statsmodels.gam.api import GLMGam, BSplines
# import data
In [3]: from statsmodels.gam.tests.test_penalized import df_autos
# create spline basis for weight and hp
In [4]: x_spline = df_autos[['weight', 'hp']]
In [5]: bs = BSplines(x_spline, df=[12, 10], degree=[3, 3])
# penalization weight
In [6]: alpha = np.array([21833888.8, 6460.38479])
In [7]: gam_bs = GLMGam.from_formula('city_mpg ~ fuel + drive', data=df_autos,
...: smoother=bs, alpha=alpha)
...:
In [8]: res_bs = gam_bs.fit()
In [9]: print(res_bs.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: city_mpg No. Observations: 203
Model: GLMGam Df Residuals: 189.13
Model Family: Gaussian Df Model: 12.87
Link Function: identity Scale: 4.8825
Method: PIRLS Log-Likelihood: -441.81
Date: Mon, 24 Jun 2019 Deviance: 923.45
Time: 17:27:51 Pearson chi2: 923.
No. Iterations: 3
Covariance Type: nonrobust
================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------
Intercept 51.9923 1.997 26.034 0.000 48.078 55.906
fuel[T.gas] -5.8099 0.727 -7.989 0.000 -7.235 -4.385
drive[T.fwd] 1.3910 0.819 1.699 0.089 -0.213 2.995
drive[T.rwd] 1.0638 0.842 1.263 0.207 -0.587 2.715
weight_s0 -3.5556 0.959 -3.707 0.000 -5.436 -1.676
weight_s1 -9.0876 1.750 -5.193 0.000 -12.518 -5.658
weight_s2 -13.0303 1.827 -7.132 0.000 -16.611 -9.450
weight_s3 -14.2641 1.854 -7.695 0.000 -17.897 -10.631
weight_s4 -15.1805 1.892 -8.024 0.000 -18.889 -11.472
weight_s5 -15.9557 1.963 -8.128 0.000 -19.803 -12.108
weight_s6 -16.6297 2.038 -8.161 0.000 -20.624 -12.636
weight_s7 -16.9928 2.045 -8.308 0.000 -21.002 -12.984
weight_s8 -19.3480 2.367 -8.174 0.000 -23.987 -14.709
weight_s9 -20.7978 2.455 -8.472 0.000 -25.609 -15.986
weight_s10 -20.8062 2.443 -8.517 0.000 -25.594 -16.018
hp_s0 -1.4473 0.558 -2.592 0.010 -2.542 -0.353
hp_s1 -3.4228 1.012 -3.381 0.001 -5.407 -1.438
hp_s2 -5.9026 1.251 -4.717 0.000 -8.355 -3.450
hp_s3 -7.2389 1.352 -5.354 0.000 -9.889 -4.589
hp_s4 -9.1052 1.384 -6.581 0.000 -11.817 -6.393
hp_s5 -9.9865 1.525 -6.547 0.000 -12.976 -6.997
hp_s6 -13.3639 2.228 -5.998 0.000 -17.731 -8.997
hp_s7 -13.8902 3.194 -4.349 0.000 -20.150 -7.630
hp_s8 -11.9752 2.556 -4.685 0.000 -16.985 -6.965
================================================================================
# plot smooth components
In [10]: res_bs.plot_partial(0, cpr=True)