Regression diagnostics
========================


.. _regression_diagnostics_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/regression_diagnostics.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>This example file shows how to use a few of the <code>statsmodels</code> regression diagnostic tests in a real-life context. You can learn about more tests and find out more information abou the tests here on the <a href="http://statsmodels.sourceforge.net/stable/diagnostic.html">Regression Diagnostics page.</a></p>
   <p>Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online <code>statsmodels</code> documentation. For presentation purposes, we use the <code>zip(name,test)</code> construct to pretty-print short descriptions in the examples below.</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Estimate-a-regression-model">Estimate a regression model<a class="anchor-link" href="#Estimate-a-regression-model">&#182;</a></h2>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span>
   <span class="kn">from</span> <span class="nn">statsmodels.compat</span> <span class="k">import</span> <span class="n">lzip</span>
   <span class="kn">import</span> <span class="nn">statsmodels</span>
   <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
   <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
   <span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="k">as</span> <span class="nn">smf</span>
   <span class="kn">import</span> <span class="nn">statsmodels.stats.api</span> <span class="k">as</span> <span class="nn">sms</span>
   <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
   
   <span class="c"># Load data</span>
   <span class="n">url</span> <span class="o">=</span> <span class="s">&#39;http://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv&#39;</span>
   <span class="n">dat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
   
   <span class="c"># Fit regression model (using the natural log of one of the regressaors)</span>
   <span class="n">results</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="s">&#39;Lottery ~ Literacy + np.log(Pop1831)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dat</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   
   <span class="c"># Inspect the results</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Normality-of-the-residuals">Normality of the residuals<a class="anchor-link" href="#Normality-of-the-residuals">&#182;</a></h2>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Jarque-Bera test:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;Jarque-Bera&#39;</span><span class="p">,</span> <span class="s">&#39;Chi^2 two-tail prob.&#39;</span><span class="p">,</span> <span class="s">&#39;Skew&#39;</span><span class="p">,</span> <span class="s">&#39;Kurtosis&#39;</span><span class="p">]</span>
   <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">jarque_bera</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">)</span>
   <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>                            OLS Regression Results                            
   ==============================================================================
   Dep. Variable:                Lottery   R-squared:                       0.348
   Model:                            OLS   Adj. R-squared:                  0.333
   Method:                 Least Squares   F-statistic:                     22.20
   Date:                Mon, 20 Jul 2015   Prob (F-statistic):           1.90e-08
   Time:                        17:44:23   Log-Likelihood:                -379.82
   No. Observations:                  86   AIC:                             765.6
   Df Residuals:                      83   BIC:                             773.0
   Df Model:                           2                                         
   Covariance Type:            nonrobust                                         
   ===================================================================================
                         coef    std err          t      P&gt;|t|      [95.0% Conf. Int.]
   -----------------------------------------------------------------------------------
   Intercept         246.4341     35.233      6.995      0.000       176.358   316.510
   Literacy           -0.4889      0.128     -3.832      0.000        -0.743    -0.235
   np.log(Pop1831)   -31.3114      5.977     -5.239      0.000       -43.199   -19.424
   ==============================================================================
   Omnibus:                        3.713   Durbin-Watson:                   2.019
   Prob(Omnibus):                  0.156   Jarque-Bera (JB):                3.394
   Skew:                          -0.487   Prob(JB):                        0.183
   Kurtosis:                       3.003   Cond. No.                         702.
   ==============================================================================
   
   Warnings:
   [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Omni test:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;Chi^2&#39;</span><span class="p">,</span> <span class="s">&#39;Two-tail probability&#39;</span><span class="p">]</span>
   <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">omni_normtest</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">)</span>
   <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Influence-tests">Influence tests<a class="anchor-link" href="#Influence-tests">&#182;</a></h2><p>Once created, an object of class <code>OLSInfluence</code> holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.stats.outliers_influence</span> <span class="k">import</span> <span class="n">OLSInfluence</span>
   <span class="n">test_class</span> <span class="o">=</span> <span class="n">OLSInfluence</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
   <span class="n">test_class</span><span class="o">.</span><span class="n">dfbetas</span><span class="p">[:</span><span class="mi">5</span><span class="p">,:]</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Explore other options by typing <code>dir(influence_test)</code></p>
   <p>Useful information on leverage can also be plotted:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.graphics.regressionplots</span> <span class="k">import</span> <span class="n">plot_leverage_resid2</span>
   <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">6</span><span class="p">))</span>
   <span class="n">fig</span> <span class="o">=</span> <span class="n">plot_leverage_resid2</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">ax</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Other plotting options can be found on the <a href="http://statsmodels.sourceforge.net/stable/graphics.html">Graphics page.</a></p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Multicollinearity">Multicollinearity<a class="anchor-link" href="#Multicollinearity">&#182;</a></h2><p>Condition number:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">cond</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Heteroskedasticity-tests">Heteroskedasticity tests<a class="anchor-link" href="#Heteroskedasticity-tests">&#182;</a></h2><p>Breush-Pagan test:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;Lagrange multiplier statistic&#39;</span><span class="p">,</span> <span class="s">&#39;p-value&#39;</span><span class="p">,</span> 
           <span class="s">&#39;f-value&#39;</span><span class="p">,</span> <span class="s">&#39;f p-value&#39;</span><span class="p">]</span>
   <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">het_breushpagan</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">,</span> <span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span>
   <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Goldfeld-Quandt test</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;F statistic&#39;</span><span class="p">,</span> <span class="s">&#39;p-value&#39;</span><span class="p">]</span>
   <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">het_goldfeldquandt</span><span class="p">(</span><span class="n">results</span><span class="o">.</span><span class="n">resid</span><span class="p">,</span> <span class="n">results</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span>
   <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Linearity">Linearity<a class="anchor-link" href="#Linearity">&#182;</a></h2><p>Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;t value&#39;</span><span class="p">,</span> <span class="s">&#39;p value&#39;</span><span class="p">]</span>
   <span class="n">test</span> <span class="o">=</span> <span class="n">sms</span><span class="o">.</span><span class="n">linear_harvey_collier</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
   <span class="n">lzip</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>