Discrete Choice Models ======================== .. _discrete_choice_example_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/discrete_choice_example.ipynb>`_ .. raw:: html <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Fair's-Affair-data">Fair's Affair data<a class="anchor-link" href="#Fair's-Affair-data">¶</a></h2> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>A survey of women only was conducted in 1974 by <em>Redbook</em> asking about extramarital affairs.</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">scipy</span> <span class="k">import</span> <span class="n">stats</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span> <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span> <span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="k">import</span> <span class="n">logit</span><span class="p">,</span> <span class="n">probit</span><span class="p">,</span> <span class="n">poisson</span><span class="p">,</span> <span class="n">ols</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fair</span><span class="o">.</span><span class="n">SOURCE</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fair</span><span class="o">.</span><span class="n">NOTE</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Fair, Ray. 1978. "A Theory of Extramarital Affairs," `Journal of Political Economy`, February, 45-61. The data is available at http://fairmodel.econ.yale.edu/rayfair/pdf/2011b.htm </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fair</span><span class="o">.</span><span class="n">load_pandas</span><span class="p">()</span><span class="o">.</span><span class="n">data</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>:: Number of observations: 6366 Number of variables: 9 Variable name definitions: rate_marriage : How rate marriage, 1 = very poor, 2 = poor, 3 = fair, 4 = good, 5 = very good age : Age yrs_married : No. years married. Interval approximations. See original paper for detailed explanation. children : No. children religious : How relgious, 1 = not, 2 = mildly, 3 = fairly, 4 = strongly educ : Level of education, 9 = grade school, 12 = high school, 14 = some college, 16 = college graduate, 17 = some graduate school, 20 = advanced degree occupation : 1 = student, 2 = farming, agriculture; semi-skilled, or unskilled worker; 3 = white-colloar; 4 = teacher counselor social worker, nurse; artist, writers; technician, skilled worker, 5 = managerial, administrative, business, 6 = professional with advanced degree occupation_husb : Husband's occupation. Same as occupation. affairs : measure of time spent in extramarital affairs See the original paper for more details. </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">dta</span><span class="p">[</span><span class="s">'affair'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">dta</span><span class="p">[</span><span class="s">'affairs'</span><span class="p">]</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">dta</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">dta</span><span class="o">.</span><span class="n">describe</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> rate_marriage age yrs_married children religious educ occupation \ 0 3 32 9.0 3.0 3 17 2 1 3 27 13.0 3.0 1 14 3 2 4 22 2.5 0.0 1 16 3 3 4 37 16.5 4.0 3 16 5 4 5 27 9.0 1.0 1 14 3 5 4 27 9.0 0.0 2 14 3 6 5 37 23.0 5.5 2 12 5 7 5 37 23.0 5.5 2 12 2 8 3 22 2.5 0.0 2 12 3 9 3 27 6.0 0.0 1 16 3 occupation_husb affairs affair 0 5 0.111111 1 1 4 3.230769 1 2 5 1.400000 1 3 5 0.727273 1 4 4 4.666666 1 5 4 4.666666 1 6 4 0.852174 1 7 3 1.826086 1 8 3 4.799999 1 9 5 1.333333 1 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">affair_mod</span> <span class="o">=</span> <span class="n">logit</span><span class="p">(</span><span class="s">"affair ~ occupation + educ + occupation_husb"</span> <span class="s">"+ rate_marriage + age + yrs_married + children"</span> <span class="s">" + religious"</span><span class="p">,</span> <span class="n">dta</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> rate_marriage age yrs_married children religious \ count 6366.000000 6366.000000 6366.000000 6366.000000 6366.000000 mean 4.109645 29.082862 9.009425 1.396874 2.426170 std 0.961430 6.847882 7.280120 1.433471 0.878369 min 1.000000 17.500000 0.500000 0.000000 1.000000 25% 4.000000 22.000000 2.500000 0.000000 2.000000 50% 4.000000 27.000000 6.000000 1.000000 2.000000 75% 5.000000 32.000000 16.500000 2.000000 3.000000 max 5.000000 42.000000 23.000000 5.500000 4.000000 educ occupation occupation_husb affairs affair count 6366.000000 6366.000000 6366.000000 6366.000000 6366.000000 mean 14.209865 3.424128 3.850141 0.705374 0.322495 std 2.178003 0.942399 1.346435 2.203374 0.467468 min 9.000000 1.000000 1.000000 0.000000 0.000000 25% 12.000000 3.000000 3.000000 0.000000 0.000000 50% 14.000000 3.000000 4.000000 0.000000 0.000000 75% 16.000000 4.000000 5.000000 0.484848 1.000000 max 20.000000 6.000000 6.000000 57.599991 1.000000 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">affair_mod</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Optimization terminated successfully. Current function value: 0.545314 Iterations 6 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>How well are we predicting?</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">affair_mod</span><span class="o">.</span><span class="n">pred_table</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Logit Regression Results ============================================================================== Dep. Variable: affair No. Observations: 6366 Model: Logit Df Residuals: 6357 Method: MLE Df Model: 8 Date: Mon, 20 Jul 2015 Pseudo R-squ.: 0.1327 Time: 17:43:18 Log-Likelihood: -3471.5 converged: True LL-Null: -4002.5 LLR p-value: 5.807e-224 =================================================================================== coef std err z P>|z| [95.0% Conf. Int.] ----------------------------------------------------------------------------------- Intercept 3.7257 0.299 12.470 0.000 3.140 4.311 occupation 0.1602 0.034 4.717 0.000 0.094 0.227 educ -0.0392 0.015 -2.533 0.011 -0.070 -0.009 occupation_husb 0.0124 0.023 0.541 0.589 -0.033 0.057 rate_marriage -0.7161 0.031 -22.784 0.000 -0.778 -0.655 age -0.0605 0.010 -5.885 0.000 -0.081 -0.040 yrs_married 0.1100 0.011 10.054 0.000 0.089 0.131 children -0.0042 0.032 -0.134 0.893 -0.066 0.058 religious -0.3752 0.035 -10.792 0.000 -0.443 -0.307 =================================================================================== </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The coefficients of the discrete choice model do not tell us much. What we're after is marginal effects.</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">mfx</span> <span class="o">=</span> <span class="n">affair_mod</span><span class="o">.</span><span class="n">get_margeff</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">mfx</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">respondent1000</span> <span class="o">=</span> <span class="n">dta</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="mi">1000</span><span class="p">]</span> <span class="nb">print</span><span class="p">(</span><span class="n">respondent1000</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Logit Marginal Effects ===================================== Dep. Variable: affair Method: dydx At: overall =================================================================================== dy/dx std err z P>|z| [95.0% Conf. Int.] ----------------------------------------------------------------------------------- occupation 0.0293 0.006 4.744 0.000 0.017 0.041 educ -0.0072 0.003 -2.538 0.011 -0.013 -0.002 occupation_husb 0.0023 0.004 0.541 0.589 -0.006 0.010 rate_marriage -0.1308 0.005 -26.891 0.000 -0.140 -0.121 age -0.0110 0.002 -5.937 0.000 -0.015 -0.007 yrs_married 0.0201 0.002 10.327 0.000 0.016 0.024 children -0.0008 0.006 -0.134 0.893 -0.012 0.011 religious -0.0685 0.006 -11.119 0.000 -0.081 -0.056 =================================================================================== </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">resp</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">9</span><span class="p">),</span> <span class="n">respondent1000</span><span class="p">[[</span><span class="s">"occupation"</span><span class="p">,</span> <span class="s">"educ"</span><span class="p">,</span> <span class="s">"occupation_husb"</span><span class="p">,</span> <span class="s">"rate_marriage"</span><span class="p">,</span> <span class="s">"age"</span><span class="p">,</span> <span class="s">"yrs_married"</span><span class="p">,</span> <span class="s">"children"</span><span class="p">,</span> <span class="s">"religious"</span><span class="p">]]</span><span class="o">.</span><span class="n">tolist</span><span class="p">()))</span> <span class="n">resp</span><span class="o">.</span><span class="n">update</span><span class="p">({</span><span class="mi">0</span> <span class="p">:</span> <span class="mi">1</span><span class="p">})</span> <span class="nb">print</span><span class="p">(</span><span class="n">resp</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>rate_marriage 4.000000 age 37.000000 yrs_married 23.000000 children 3.000000 religious 3.000000 educ 12.000000 occupation 3.000000 occupation_husb 4.000000 affairs 0.521739 affair 1.000000 Name: 1000, dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">mfx</span> <span class="o">=</span> <span class="n">affair_mod</span><span class="o">.</span><span class="n">get_margeff</span><span class="p">(</span><span class="n">atexog</span><span class="o">=</span><span class="n">resp</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">mfx</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>{0: 1, 1: 3.0, 2: 12.0, 3: 4.0, 4: 4.0, 5: 37.0, 6: 23.0, 7: 3.0, 8: 3.0} </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">affair_mod</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">respondent1000</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Logit Marginal Effects ===================================== Dep. Variable: affair Method: dydx At: overall =================================================================================== dy/dx std err z P>|z| [95.0% Conf. Int.] ----------------------------------------------------------------------------------- occupation 0.0400 0.008 4.711 0.000 0.023 0.057 educ -0.0098 0.004 -2.537 0.011 -0.017 -0.002 occupation_husb 0.0031 0.006 0.541 0.589 -0.008 0.014 rate_marriage -0.1788 0.008 -22.743 0.000 -0.194 -0.163 age -0.0151 0.003 -5.928 0.000 -0.020 -0.010 yrs_married 0.0275 0.003 10.256 0.000 0.022 0.033 children -0.0011 0.008 -0.134 0.893 -0.017 0.014 religious -0.0937 0.009 -10.722 0.000 -0.111 -0.077 =================================================================================== </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">affair_mod</span><span class="o">.</span><span class="n">fittedvalues</span><span class="p">[</span><span class="mi">1000</span><span class="p">]</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">affair_mod</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">affair_mod</span><span class="o">.</span><span class="n">fittedvalues</span><span class="p">[</span><span class="mi">1000</span><span class="p">])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The "correct" model here is likely the Tobit model. We have an work in progress branch "tobit-model" on github, if anyone is interested in censored regression models.</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Exercise:-Logit-vs-Probit">Exercise: Logit vs Probit<a class="anchor-link" href="#Exercise:-Logit-vs-Probit">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">6</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">stats</span><span class="o">.</span><span class="n">logistic</span><span class="o">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">support</span><span class="p">),</span> <span class="s">'r-'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Logistic'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">stats</span><span class="o">.</span><span class="n">norm</span><span class="o">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">support</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">'Probit'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">();</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">6</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">stats</span><span class="o">.</span><span class="n">logistic</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">support</span><span class="p">),</span> <span class="s">'r-'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Logistic'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">stats</span><span class="o">.</span><span class="n">norm</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">support</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">'Probit'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">();</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Compare the estimates of the Logit Fair model above to a Probit model. Does the prediction table look better? Much difference in marginal effects?</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Genarlized-Linear-Model-Example">Genarlized Linear Model Example<a class="anchor-link" href="#Genarlized-Linear-Model-Example">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">SOURCE</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">DESCRLONG</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Jeff Gill's `Generalized Linear Models: A Unified Approach` http://jgill.wustl.edu/research/books.html </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">NOTE</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> This data is on the California education policy and outcomes (STAR program results for 1998. The data measured standardized testing by the California Department of Education that required evaluation of 2nd - 11th grade students by the the Stanford 9 test on a variety of subjects. This dataset is at the level of the unified school district and consists of 303 cases. The binary response variable represents the number of 9th graders scoring over the national median value on the mathematics exam. The data used in this example is only a subset of the original source. </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">load_pandas</span><span class="p">()</span><span class="o">.</span><span class="n">data</span> <span class="nb">print</span><span class="p">(</span><span class="n">dta</span><span class="o">.</span><span class="n">columns</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>:: Number of Observations - 303 (counties in California). Number of Variables - 13 and 8 interaction terms. Definition of variables names:: NABOVE - Total number of students above the national median for the math section. NBELOW - Total number of students below the national median for the math section. LOWINC - Percentage of low income students PERASIAN - Percentage of Asian student PERBLACK - Percentage of black students PERHISP - Percentage of Hispanic students PERMINTE - Percentage of minority teachers AVYRSEXP - Sum of teachers' years in educational service divided by the number of teachers. AVSALK - Total salary budget including benefits divided by the number of full-time teachers (in thousands) PERSPENK - Per-pupil spending (in thousands) PTRATIO - Pupil-teacher ratio. PCTAF - Percentage of students taking UC/CSU prep courses PCTCHRT - Percentage of charter schools PCTYRRND - Percentage of year-round schools The below variables are interaction terms of the variables defined above. PERMINTE_AVYRSEXP PEMINTE_AVSAL AVYRSEXP_AVSAL PERSPEN_PTRATIO PERSPEN_PCTAF PTRATIO_PCTAF PERMINTE_AVTRSEXP_AVSAL PERSPEN_PTRATIO_PCTAF </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">dta</span><span class="p">[[</span><span class="s">'NABOVE'</span><span class="p">,</span> <span class="s">'NBELOW'</span><span class="p">,</span> <span class="s">'LOWINC'</span><span class="p">,</span> <span class="s">'PERASIAN'</span><span class="p">,</span> <span class="s">'PERBLACK'</span><span class="p">,</span> <span class="s">'PERHISP'</span><span class="p">,</span> <span class="s">'PERMINTE'</span><span class="p">]]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Index(['NABOVE', 'NBELOW', 'LOWINC', 'PERASIAN', 'PERBLACK', 'PERHISP', 'PERMINTE', 'AVYRSEXP', 'AVSALK', 'PERSPENK', 'PTRATIO', 'PCTAF', 'PCTCHRT', 'PCTYRRND', 'PERMINTE_AVYRSEXP', 'PERMINTE_AVSAL', 'AVYRSEXP_AVSAL', 'PERSPEN_PTRATIO', 'PERSPEN_PCTAF', 'PTRATIO_PCTAF', 'PERMINTE_AVYRSEXP_AVSAL', 'PERSPEN_PTRATIO_PCTAF'], dtype='object') </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">dta</span><span class="p">[[</span><span class="s">'AVYRSEXP'</span><span class="p">,</span> <span class="s">'AVSALK'</span><span class="p">,</span> <span class="s">'PERSPENK'</span><span class="p">,</span> <span class="s">'PTRATIO'</span><span class="p">,</span> <span class="s">'PCTAF'</span><span class="p">,</span> <span class="s">'PCTCHRT'</span><span class="p">,</span> <span class="s">'PCTYRRND'</span><span class="p">]]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> NABOVE NBELOW LOWINC PERASIAN PERBLACK PERHISP PERMINTE 0 452 355 34.39730 23.299300 14.235280 11.411120 15.918370 1 144 40 17.36507 29.328380 8.234897 9.314884 13.636360 2 337 234 32.64324 9.226386 42.406310 13.543720 28.834360 3 395 178 11.90953 13.883090 3.796973 11.443110 11.111110 4 8 57 36.88889 12.187500 76.875000 7.604167 43.589740 5 1348 899 20.93149 28.023510 4.643221 13.808160 15.378490 6 477 887 53.26898 8.447858 19.374830 37.905330 25.525530 7 565 347 15.19009 3.665781 2.649680 13.092070 6.203008 8 205 320 28.21582 10.430420 6.786374 32.334300 13.461540 9 469 598 32.77897 17.178310 12.484930 28.323290 27.259890 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">formula</span> <span class="o">=</span> <span class="s">'NABOVE + NBELOW ~ LOWINC + PERASIAN + PERBLACK + PERHISP + PCTCHRT '</span> <span class="n">formula</span> <span class="o">+=</span> <span class="s">'+ PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF'</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> AVYRSEXP AVSALK PERSPENK PTRATIO PCTAF PCTCHRT PCTYRRND 0 14.70646 59.15732 4.445207 21.71025 57.03276 0 22.222220 1 16.08324 59.50397 5.267598 20.44278 64.62264 0 0.000000 2 14.59559 60.56992 5.482922 18.95419 53.94191 0 0.000000 3 14.38939 58.33411 4.165093 21.63539 49.06103 0 7.142857 4 13.90568 63.15364 4.324902 18.77984 52.38095 0 0.000000 5 14.97755 66.97055 3.916104 24.51914 44.91578 0 2.380952 6 14.67829 57.62195 4.270903 22.21278 32.28916 0 12.121210 7 13.66197 63.44740 4.309734 24.59026 30.45267 0 0.000000 8 16.41760 57.84564 4.527603 21.74138 22.64574 0 0.000000 9 12.51864 57.80141 4.648917 20.26010 26.07099 0 0.000000 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h4 id="Aside:-Binomial-distribution">Aside: Binomial distribution<a class="anchor-link" href="#Aside:-Binomial-distribution">¶</a></h4> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Toss a six-sided die 5 times, what's the probability of exactly 2 fours?</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">stats</span><span class="o">.</span><span class="n">binom</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mf">1.</span><span class="o">/</span><span class="mi">6</span><span class="p">)</span><span class="o">.</span><span class="n">pmf</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">scipy.misc</span> <span class="k">import</span> <span class="n">comb</span> <span class="n">comb</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mf">6.</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="mi">5</span><span class="o">/</span><span class="mf">6.</span><span class="p">)</span><span class="o">**</span><span class="mi">3</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="k">import</span> <span class="n">glm</span> <span class="n">glm_mod</span> <span class="o">=</span> <span class="n">glm</span><span class="p">(</span><span class="n">formula</span><span class="p">,</span> <span class="n">dta</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">glm_mod</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The number of trials</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">glm_mod</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">orig_endog</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Generalized Linear Model Regression Results ================================================================================ Dep. Variable: ['NABOVE', 'NBELOW'] No. Observations: 303 Model: GLM Df Residuals: 282 Model Family: Binomial Df Model: 20 Link Function: logit Scale: 1.0 Method: IRLS Log-Likelihood: -2998.6 Date: Mon, 20 Jul 2015 Deviance: 4078.8 Time: 17:43:18 Pearson chi2: 4.05e+03 No. Iterations: 7 ============================================================================================ coef std err z P>|z| [95.0% Conf. Int.] -------------------------------------------------------------------------------------------- Intercept 2.9589 1.547 1.913 0.056 -0.073 5.990 LOWINC -0.0168 0.000 -38.749 0.000 -0.018 -0.016 PERASIAN 0.0099 0.001 16.505 0.000 0.009 0.011 PERBLACK -0.0187 0.001 -25.182 0.000 -0.020 -0.017 PERHISP -0.0142 0.000 -32.818 0.000 -0.015 -0.013 PCTCHRT 0.0049 0.001 3.921 0.000 0.002 0.007 PCTYRRND -0.0036 0.000 -15.878 0.000 -0.004 -0.003 PERMINTE 0.2545 0.030 8.498 0.000 0.196 0.313 AVYRSEXP 0.2407 0.057 4.212 0.000 0.129 0.353 PERMINTE:AVYRSEXP -0.0141 0.002 -7.391 0.000 -0.018 -0.010 AVSALK 0.0804 0.014 5.775 0.000 0.053 0.108 PERMINTE:AVSALK -0.0040 0.000 -8.450 0.000 -0.005 -0.003 AVYRSEXP:AVSALK -0.0039 0.001 -4.059 0.000 -0.006 -0.002 PERMINTE:AVYRSEXP:AVSALK 0.0002 2.99e-05 7.428 0.000 0.000 0.000 PERSPENK -1.9522 0.317 -6.162 0.000 -2.573 -1.331 PTRATIO -0.3341 0.061 -5.453 0.000 -0.454 -0.214 PERSPENK:PTRATIO 0.0917 0.015 6.321 0.000 0.063 0.120 PCTAF -0.1690 0.033 -5.169 0.000 -0.233 -0.105 PERSPENK:PCTAF 0.0490 0.007 6.574 0.000 0.034 0.064 PTRATIO:PCTAF 0.0080 0.001 5.362 0.000 0.005 0.011 PERSPENK:PTRATIO:PCTAF -0.0022 0.000 -6.445 0.000 -0.003 -0.002 ============================================================================================ </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">glm_mod</span><span class="o">.</span><span class="n">fittedvalues</span> <span class="o">*</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">orig_endog</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>First differences: We hold all explanatory variables constant at their means and manipulate the percentage of low income households to assess its impact on the response variables:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">exog</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">orig_exog</span> <span class="c"># get the dataframe</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">means25</span> <span class="o">=</span> <span class="n">exog</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">means25</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">means25</span><span class="p">[</span><span class="s">'LOWINC'</span><span class="p">]</span> <span class="o">=</span> <span class="n">exog</span><span class="p">[</span><span class="s">'LOWINC'</span><span class="p">]</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="o">.</span><span class="mi">25</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">means25</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Intercept 1.000000 LOWINC 41.409877 PERASIAN 5.896335 PERBLACK 5.636808 PERHISP 34.398080 PCTCHRT 1.175909 PCTYRRND 11.611905 PERMINTE 14.694747 AVYRSEXP 14.253875 PERMINTE:AVYRSEXP 209.018700 AVSALK 58.640258 PERMINTE:AVSALK 879.979883 AVYRSEXP:AVSALK 839.718173 PERMINTE:AVYRSEXP:AVSALK 12585.266464 PERSPENK 4.320310 PTRATIO 22.464250 PERSPENK:PTRATIO 96.295756 PCTAF 33.630593 PERSPENK:PCTAF 147.235740 PTRATIO:PCTAF 747.445536 PERSPENK:PTRATIO:PCTAF 3243.607568 dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">means75</span> <span class="o">=</span> <span class="n">exog</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="n">means75</span><span class="p">[</span><span class="s">'LOWINC'</span><span class="p">]</span> <span class="o">=</span> <span class="n">exog</span><span class="p">[</span><span class="s">'LOWINC'</span><span class="p">]</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="o">.</span><span class="mi">75</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">means75</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Intercept 1.000000 LOWINC 26.683040 PERASIAN 5.896335 PERBLACK 5.636808 PERHISP 34.398080 PCTCHRT 1.175909 PCTYRRND 11.611905 PERMINTE 14.694747 AVYRSEXP 14.253875 PERMINTE:AVYRSEXP 209.018700 AVSALK 58.640258 PERMINTE:AVSALK 879.979883 AVYRSEXP:AVSALK 839.718173 PERMINTE:AVYRSEXP:AVSALK 12585.266464 PERSPENK 4.320310 PTRATIO 22.464250 PERSPENK:PTRATIO 96.295756 PCTAF 33.630593 PERSPENK:PCTAF 147.235740 PTRATIO:PCTAF 747.445536 PERSPENK:PTRATIO:PCTAF 3243.607568 dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">resp25</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">means25</span><span class="p">)</span> <span class="n">resp75</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">means75</span><span class="p">)</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">resp75</span> <span class="o">-</span> <span class="n">resp25</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Intercept 1.000000 LOWINC 55.460075 PERASIAN 5.896335 PERBLACK 5.636808 PERHISP 34.398080 PCTCHRT 1.175909 PCTYRRND 11.611905 PERMINTE 14.694747 AVYRSEXP 14.253875 PERMINTE:AVYRSEXP 209.018700 AVSALK 58.640258 PERMINTE:AVSALK 879.979883 AVYRSEXP:AVSALK 839.718173 PERMINTE:AVYRSEXP:AVSALK 12585.266464 PERSPENK 4.320310 PTRATIO 22.464250 PERSPENK:PTRATIO 96.295756 PCTAF 33.630593 PERSPENK:PCTAF 147.235740 PTRATIO:PCTAF 747.445536 PERSPENK:PTRATIO:PCTAF 3243.607568 dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The interquartile first difference for the percentage of low income households in a school district is:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="s">"%2.4f%%"</span> <span class="o">%</span> <span class="p">(</span><span class="n">diff</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="mi">100</span><span class="p">))</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">nobs</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">nobs</span> <span class="n">y</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">endog</span> <span class="n">yhat</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">mu</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>-11.8863% </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.graphics.api</span> <span class="k">import</span> <span class="n">abline_plot</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Observed Values'</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">'Fitted Values'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="n">y_vs_yhat</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">True</span><span class="p">))</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">abline_plot</span><span class="p">(</span><span class="n">model_results</span><span class="o">=</span><span class="n">y_vs_yhat</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h4 id="Plot-fitted-values-vs-Pearson-residuals">Plot fitted values vs Pearson residuals<a class="anchor-link" href="#Plot-fitted-values-vs-Pearson-residuals">¶</a></h4> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Pearson residuals are defined to be</p> $$\frac{(y - \mu)}{\sqrt{(var(\mu))}}$$<p>where var is typically determined by the family. E.g., binomial variance is $np(1 - p)$</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Residual Dependence Plot'</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">'Fitted Values'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Pearson Residuals'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">yhat</span><span class="p">,</span> <span class="n">stats</span><span class="o">.</span><span class="n">zscore</span><span class="p">(</span><span class="n">glm_mod</span><span class="o">.</span><span class="n">resid_pearson</span><span class="p">))</span> <span class="n">ax</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'tight'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">],</span> <span class="s">'k-'</span><span class="p">);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h4 id="Histogram-of-standardized-deviance-residuals-with-Kernel-Density-Estimate-overlayed">Histogram of standardized deviance residuals with Kernel Density Estimate overlayed<a class="anchor-link" href="#Histogram-of-standardized-deviance-residuals-with-Kernel-Density-Estimate-overlayed">¶</a></h4> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The definition of the deviance residuals depends on the family. For the Binomial distribution this is</p> $$r_{dev} = sign\left(Y-\mu\right)*\sqrt{2n(Y\log\frac{Y}{\mu}+(1-Y)\log\frac{(1-Y)}{(1-\mu)}}$$<p>They can be used to detect ill-fitting covariates</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">resid</span> <span class="o">=</span> <span class="n">glm_mod</span><span class="o">.</span><span class="n">resid_deviance</span> <span class="n">resid_std</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">zscore</span><span class="p">(</span><span class="n">resid</span><span class="p">)</span> <span class="n">kde_resid</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">nonparametric</span><span class="o">.</span><span class="n">KDEUnivariate</span><span class="p">(</span><span class="n">resid_std</span><span class="p">)</span> <span class="n">kde_resid</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Standardized Deviance Residuals"</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">resid_std</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span> <span class="n">normed</span><span class="o">=</span><span class="k">True</span><span class="p">);</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">kde_resid</span><span class="o">.</span><span class="n">support</span><span class="p">,</span> <span class="n">kde_resid</span><span class="o">.</span><span class="n">density</span><span class="p">,</span> <span class="s">'r'</span><span class="p">);</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h4 id="QQ-plot-of-deviance-residuals">QQ-plot of deviance residuals<a class="anchor-link" href="#QQ-plot-of-deviance-residuals">¶</a></h4> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">graphics</span><span class="o">.</span><span class="n">qqplot</span><span class="p">(</span><span class="n">resid</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="s">'r'</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>