Generalized Linear Models =========================== .. _glm_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/glm.ipynb>`_ .. raw:: html <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span> <span class="kn">from</span> <span class="nn">scipy</span> <span class="k">import</span> <span class="n">stats</span> <span class="kn">from</span> <span class="nn">matplotlib</span> <span class="k">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="GLM:-Binomial-response-data">GLM: Binomial response data<a class="anchor-link" href="#GLM:-Binomial-response-data">¶</a></h2><h3 id="Load-data">Load data<a class="anchor-link" href="#Load-data">¶</a></h3><p>In this example, we use the Star98 dataset which was taken with permission from Jeff Gill (2000) Generalized linear models: A unified approach. Codebook information can be obtained by typing:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">NOTE</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Load the data and add a constant to the exogenous (independent) variables:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">data</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">load</span><span class="p">()</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">False</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>:: Number of Observations - 303 (counties in California). Number of Variables - 13 and 8 interaction terms. Definition of variables names:: NABOVE - Total number of students above the national median for the math section. NBELOW - Total number of students below the national median for the math section. LOWINC - Percentage of low income students PERASIAN - Percentage of Asian student PERBLACK - Percentage of black students PERHISP - Percentage of Hispanic students PERMINTE - Percentage of minority teachers AVYRSEXP - Sum of teachers' years in educational service divided by the number of teachers. AVSALK - Total salary budget including benefits divided by the number of full-time teachers (in thousands) PERSPENK - Per-pupil spending (in thousands) PTRATIO - Pupil-teacher ratio. PCTAF - Percentage of students taking UC/CSU prep courses PCTCHRT - Percentage of charter schools PCTYRRND - Percentage of year-round schools The below variables are interaction terms of the variables defined above. PERMINTE_AVYRSEXP PEMINTE_AVSAL AVYRSEXP_AVSAL PERSPEN_PTRATIO PERSPEN_PCTAF PTRATIO_PCTAF PERMINTE_AVTRSEXP_AVSAL PERSPEN_PTRATIO_PCTAF </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The dependent variable is N by 2 (Success: NABOVE, Failure: NBELOW):</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">[:</span><span class="mi">5</span><span class="p">,:])</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The independent variables include all the other variables described above, as well as the interaction terms:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">[:</span><span class="mi">2</span><span class="p">,:])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[[ 452. 355.] [ 144. 40.] [ 337. 234.] [ 395. 178.] [ 8. 57.]] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Fit-and-summary">Fit and summary<a class="anchor-link" href="#Fit-and-summary">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">glm_binom</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">GLM</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span> <span class="n">res</span> <span class="o">=</span> <span class="n">glm_binom</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[[ 3.43973000e+01 2.32993000e+01 1.42352800e+01 1.14111200e+01 1.59183700e+01 1.47064600e+01 5.91573200e+01 4.44520700e+00 2.17102500e+01 5.70327600e+01 0.00000000e+00 2.22222200e+01 2.34102872e+02 9.41688110e+02 8.69994800e+02 9.65065600e+01 2.53522420e+02 1.23819550e+03 1.38488985e+04 5.50403520e+03 1.00000000e+00] [ 1.73650700e+01 2.93283800e+01 8.23489700e+00 9.31488400e+00 1.36363600e+01 1.60832400e+01 5.95039700e+01 5.26759800e+00 2.04427800e+01 6.46226400e+01 0.00000000e+00 0.00000000e+00 2.19316851e+02 8.11417560e+02 9.57016600e+02 1.07684350e+02 3.40406090e+02 1.32106640e+03 1.30502233e+04 6.95884680e+03 1.00000000e+00]] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Quantities-of-interest">Quantities of interest<a class="anchor-link" href="#Quantities-of-interest">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="s">'Total number of trials:'</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">sum</span><span class="p">())</span> <span class="nb">print</span><span class="p">(</span><span class="s">'Parameters: '</span><span class="p">,</span> <span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="s">'T-values: '</span><span class="p">,</span> <span class="n">res</span><span class="o">.</span><span class="n">tvalues</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Generalized Linear Model Regression Results ============================================================================== Dep. Variable: ['y1', 'y2'] No. Observations: 303 Model: GLM Df Residuals: 282 Model Family: Binomial Df Model: 20 Link Function: logit Scale: 1.0 Method: IRLS Log-Likelihood: -2998.6 Date: Mon, 20 Jul 2015 Deviance: 4078.8 Time: 17:43:33 Pearson chi2: 4.05e+03 No. Iterations: 7 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 -0.0168 0.000 -38.749 0.000 -0.018 -0.016 x2 0.0099 0.001 16.505 0.000 0.009 0.011 x3 -0.0187 0.001 -25.182 0.000 -0.020 -0.017 x4 -0.0142 0.000 -32.818 0.000 -0.015 -0.013 x5 0.2545 0.030 8.498 0.000 0.196 0.313 x6 0.2407 0.057 4.212 0.000 0.129 0.353 x7 0.0804 0.014 5.775 0.000 0.053 0.108 x8 -1.9522 0.317 -6.162 0.000 -2.573 -1.331 x9 -0.3341 0.061 -5.453 0.000 -0.454 -0.214 x10 -0.1690 0.033 -5.169 0.000 -0.233 -0.105 x11 0.0049 0.001 3.921 0.000 0.002 0.007 x12 -0.0036 0.000 -15.878 0.000 -0.004 -0.003 x13 -0.0141 0.002 -7.391 0.000 -0.018 -0.010 x14 -0.0040 0.000 -8.450 0.000 -0.005 -0.003 x15 -0.0039 0.001 -4.059 0.000 -0.006 -0.002 x16 0.0917 0.015 6.321 0.000 0.063 0.120 x17 0.0490 0.007 6.574 0.000 0.034 0.064 x18 0.0080 0.001 5.362 0.000 0.005 0.011 x19 0.0002 2.99e-05 7.428 0.000 0.000 0.000 x20 -0.0022 0.000 -6.445 0.000 -0.003 -0.002 const 2.9589 1.547 1.913 0.056 -0.073 5.990 ============================================================================== </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>First differences: We hold all explanatory variables constant at their means and manipulate the percentage of low income households to assess its impact on the response variables:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">means</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">means25</span> <span class="o">=</span> <span class="n">means</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">means25</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">scoreatpercentile</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="mi">25</span><span class="p">)</span> <span class="n">means75</span> <span class="o">=</span> <span class="n">means</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">means75</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">lowinc_75per</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">scoreatpercentile</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="mi">75</span><span class="p">)</span> <span class="n">resp_25</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">means25</span><span class="p">)</span> <span class="n">resp_75</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">means75</span><span class="p">)</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">resp_75</span> <span class="o">-</span> <span class="n">resp_25</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Total number of trials: 807.0 Parameters: [ -1.68150366e-02 9.92547661e-03 -1.87242148e-02 -1.42385609e-02 2.54487173e-01 2.40693664e-01 8.04086739e-02 -1.95216050e+00 -3.34086475e-01 -1.69022168e-01 4.91670212e-03 -3.57996435e-03 -1.40765648e-02 -4.00499176e-03 -3.90639579e-03 9.17143006e-02 4.89898381e-02 8.04073890e-03 2.22009503e-04 -2.24924861e-03 2.95887793e+00] T-values: [-38.74908321 16.50473627 -25.1821894 -32.81791308 8.49827113 4.21247925 5.7749976 -6.16191078 -5.45321673 -5.16865445 3.92119964 -15.87825999 -7.39093058 -8.44963886 -4.05916246 6.3210987 6.57434662 5.36229044 7.42806363 -6.44513698 1.91301155] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The interquartile first difference for the percentage of low income households in a school district is:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="s">"%2.4f%%"</span> <span class="o">%</span> <span class="p">(</span><span class="n">diff</span><span class="o">*</span><span class="mi">100</span><span class="p">))</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Plots">Plots<a class="anchor-link" href="#Plots">¶</a></h3><p>We extract information that will be used to draw some interesting plots:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">nobs</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">nobs</span> <span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]</span><span class="o">/</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="n">yhat</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">mu</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>-11.8753% </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Plot yhat vs y:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.graphics.api</span> <span class="k">import</span> <span class="n">abline_plot</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="n">line_fit</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">True</span><span class="p">))</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">abline_plot</span><span class="p">(</span><span class="n">model_results</span><span class="o">=</span><span class="n">line_fit</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">'Model Fit Plot'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'Observed values'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'Fitted values'</span><span class="p">);</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Plot yhat vs. Pearson residuals:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">res</span><span class="o">.</span><span class="n">resid_pearson</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">hlines</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">'Residual Dependence Plot'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'Pearson Residuals'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'Fitted values'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Histogram of standardized deviance residuals:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">scipy</span> <span class="k">import</span> <span class="n">stats</span> <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span> <span class="n">resid</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">resid_deviance</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">resid_std</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">zscore</span><span class="p">(</span><span class="n">resid</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">resid_std</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">'Histogram of standardized deviance residuals'</span><span class="p">);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>QQ Plot of Deviance Residuals:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels</span> <span class="k">import</span> <span class="n">graphics</span> <span class="n">graphics</span><span class="o">.</span><span class="n">gofplots</span><span class="o">.</span><span class="n">qqplot</span><span class="p">(</span><span class="n">resid</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="s">'r'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="GLM:-Gamma-for-proportional-count-response">GLM: Gamma for proportional count response<a class="anchor-link" href="#GLM:-Gamma-for-proportional-count-response">¶</a></h2><h3 id="Load-data">Load data<a class="anchor-link" href="#Load-data">¶</a></h3><p>In the example above, we printed the <code>NOTE</code> attribute to learn about the Star98 dataset. Statsmodels datasets ships with other useful information. For example:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">scotland</span><span class="o">.</span><span class="n">DESCRLONG</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Load the data and add a constant to the exogenous variables:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">data2</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">scotland</span><span class="o">.</span><span class="n">load</span><span class="p">()</span> <span class="n">data2</span><span class="o">.</span><span class="n">exog</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">data2</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">False</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">data2</span><span class="o">.</span><span class="n">exog</span><span class="p">[:</span><span class="mi">5</span><span class="p">,:])</span> <span class="nb">print</span><span class="p">(</span><span class="n">data2</span><span class="o">.</span><span class="n">endog</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> This data is based on the example in Gill and describes the proportion of voters who voted Yes to grant the Scottish Parliament taxation powers. The data are divided into 32 council districts. This example's explanatory variables include the amount of council tax collected in pounds sterling as of April 1997 per two adults before adjustments, the female percentage of total claims for unemployment benefits as of January, 1998, the standardized mortality rate (UK is 100), the percentage of labor force participation, regional GDP, the percentage of children aged 5 to 15, and an interaction term between female unemployment and the council tax. The original source files and variable information are included in /scotland/src/ </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Fit-and-summary">Fit and summary<a class="anchor-link" href="#Fit-and-summary">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">glm_gamma</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">GLM</span><span class="p">(</span><span class="n">data2</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">data2</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Gamma</span><span class="p">())</span> <span class="n">glm_results</span> <span class="o">=</span> <span class="n">glm_gamma</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">glm_results</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[[ 7.12000000e+02 2.10000000e+01 1.05000000e+02 8.24000000e+01 1.35660000e+04 1.23000000e+01 1.49520000e+04 1.00000000e+00] [ 6.43000000e+02 2.65000000e+01 9.70000000e+01 8.02000000e+01 1.35660000e+04 1.53000000e+01 1.70395000e+04 1.00000000e+00] [ 6.79000000e+02 2.83000000e+01 1.13000000e+02 8.63000000e+01 9.61100000e+03 1.39000000e+01 1.92157000e+04 1.00000000e+00] [ 8.01000000e+02 2.71000000e+01 1.09000000e+02 8.04000000e+01 9.48300000e+03 1.36000000e+01 2.17071000e+04 1.00000000e+00] [ 7.53000000e+02 2.20000000e+01 1.15000000e+02 6.47000000e+01 9.26500000e+03 1.46000000e+01 1.65660000e+04 1.00000000e+00]] [ 60.3 52.3 53.4 57. 68.7] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="GLM:-Gaussian-distribution-with-a-noncanonical-link">GLM: Gaussian distribution with a noncanonical link<a class="anchor-link" href="#GLM:-Gaussian-distribution-with-a-noncanonical-link">¶</a></h2><h3 id="Artificial-data">Artificial data<a class="anchor-link" href="#Artificial-data">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">nobs2</span> <span class="o">=</span> <span class="mi">100</span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">nobs2</span><span class="p">)</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">54321</span><span class="p">)</span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">column_stack</span><span class="p">((</span><span class="n">x</span><span class="p">,</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span> <span class="n">X</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">False</span><span class="p">)</span> <span class="n">lny</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">.</span><span class="mi">03</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="o">.</span><span class="mi">0001</span><span class="o">*</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">))</span> <span class="o">+</span> <span class="o">.</span><span class="mi">001</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">nobs2</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Generalized Linear Model Regression Results ============================================================================== Dep. Variable: y No. Observations: 32 Model: GLM Df Residuals: 24 Model Family: Gamma Df Model: 7 Link Function: inverse_power Scale: 0.00358428317349 Method: IRLS Log-Likelihood: -83.017 Date: Mon, 20 Jul 2015 Deviance: 0.087389 Time: 17:43:34 Pearson chi2: 0.0860 No. Iterations: 6 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05 x2 0.0020 0.001 3.824 0.000 0.001 0.003 x3 -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05 x4 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000 x5 -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08 x6 -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05 x7 -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07 const -0.0178 0.011 -1.548 0.122 -0.040 0.005 ============================================================================== </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Fit-and-summary">Fit and summary<a class="anchor-link" href="#Fit-and-summary">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">gauss_log</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">GLM</span><span class="p">(</span><span class="n">lny</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Gaussian</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">links</span><span class="o">.</span><span class="n">log</span><span class="p">))</span> <span class="n">gauss_log_results</span> <span class="o">=</span> <span class="n">gauss_log</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">gauss_log_results</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>