Regression diagnostics ======================== .. _regression_diagnostics_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/regression_diagnostics.ipynb>`_ .. raw:: html <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>This example file shows how to use a few of the <code>statsmodels</code> regression diagnostic tests in a real-life context. You can learn about more tests and find out more information abou the tests here on the <a href="http://statsmodels.sourceforge.net/stable/diagnostic.html">Regression Diagnostics page.</a></p> <p>Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online <code>statsmodels</code> documentation. For presentation purposes, we use the <code>zip(name,test)</code> construct to pretty-print short descriptions in the examples below.</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Estimate-a-regression-model">Estimate a regression model<a class="anchor-link" href="#Estimate-a-regression-model">¶</a></h2> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">from</span> <span class="nn">statsmodels.compat</span> <span class="k">import</span> <span class="n">lzip</span> <span class="kn">import</span> <span class="nn">statsmodels</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span> <span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="k">as</span> <span class="nn">smf</span> <span class="kn">import</span> <span class="nn">statsmodels.stats.api</span> <span class="k">as</span> <span class="nn">sms</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span> <span class="c"># Load data</span> <span class="n">url</span> <span class="o">=</span> <span class="s">'http://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv'</span> <span class="n">dat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="c"># Fit regression model (using the natural log of one of the regressaors)</span> <span class="n">results</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="s">'Lottery ~ Literacy + np.log(Pop1831)'</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dat</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="c"># Inspect the results</span> <span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Normality-of-the-residuals">Normality of the residuals<a class="anchor-link" href="#Normality-of-the-residuals">¶</a></h2> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Jarque-Bera test:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">'Jarque-Bera'</span><span class="p">,</span> <span class="s">'Chi^2 two-tail prob.'</span><span class="p">,</span> <span class="s">'Skew'</span><span class="p">,</span> <span class="s">'Kurtosis'</span><span class="p">]</span> <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">jarque_bera</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">)</span> <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> OLS Regression Results ============================================================================== Dep. Variable: Lottery R-squared: 0.348 Model: OLS Adj. R-squared: 0.333 Method: Least Squares F-statistic: 22.20 Date: Mon, 20 Jul 2015 Prob (F-statistic): 1.90e-08 Time: 17:44:23 Log-Likelihood: -379.82 No. Observations: 86 AIC: 765.6 Df Residuals: 83 BIC: 773.0 Df Model: 2 Covariance Type: nonrobust =================================================================================== coef std err t P>|t| [95.0% Conf. Int.] ----------------------------------------------------------------------------------- Intercept 246.4341 35.233 6.995 0.000 176.358 316.510 Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235 np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424 ============================================================================== Omnibus: 3.713 Durbin-Watson: 2.019 Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394 Skew: -0.487 Prob(JB): 0.183 Kurtosis: 3.003 Cond. No. 702. ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Omni test:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">'Chi^2'</span><span class="p">,</span> <span class="s">'Two-tail probability'</span><span class="p">]</span> <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">omni_normtest</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">)</span> <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Influence-tests">Influence tests<a class="anchor-link" href="#Influence-tests">¶</a></h2><p>Once created, an object of class <code>OLSInfluence</code> holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.stats.outliers_influence</span> <span class="k">import</span> <span class="n">OLSInfluence</span> <span class="n">test_class</span> <span class="o">=</span> <span class="n">OLSInfluence</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> <span class="n">test_class</span><span class="o">.</span><span class="n">dfbetas</span><span class="p">[:</span><span class="mi">5</span><span class="p">,:]</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Explore other options by typing <code>dir(influence_test)</code></p> <p>Useful information on leverage can also be plotted:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.graphics.regressionplots</span> <span class="k">import</span> <span class="n">plot_leverage_resid2</span> <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">6</span><span class="p">))</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plot_leverage_resid2</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">ax</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Other plotting options can be found on the <a href="http://statsmodels.sourceforge.net/stable/graphics.html">Graphics page.</a></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Multicollinearity">Multicollinearity<a class="anchor-link" href="#Multicollinearity">¶</a></h2><p>Condition number:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">cond</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Heteroskedasticity-tests">Heteroskedasticity tests<a class="anchor-link" href="#Heteroskedasticity-tests">¶</a></h2><p>Breush-Pagan test:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">'Lagrange multiplier statistic'</span><span class="p">,</span> <span class="s">'p-value'</span><span class="p">,</span> <span class="s">'f-value'</span><span class="p">,</span> <span class="s">'f p-value'</span><span class="p">]</span> <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">het_breushpagan</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">,</span> <span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span> <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Goldfeld-Quandt test</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">'F statistic'</span><span class="p">,</span> <span class="s">'p-value'</span><span class="p">]</span> <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">het_goldfeldquandt</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">,</span> <span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span> <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Linearity">Linearity<a class="anchor-link" href="#Linearity">¶</a></h2><p>Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">'t value'</span><span class="p">,</span> <span class="s">'p value'</span><span class="p">]</span> <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">linear_harvey_collier</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>