Generalized Least Squares =========================== .. _gls_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/gls.ipynb>`_ .. raw:: html <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">statsmodels.iolib.table</span> <span class="k">import</span> <span class="p">(</span><span class="n">SimpleTable</span><span class="p">,</span> <span class="n">default_txt_fmt</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>The Longley dataset is a time series dataset:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">data</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">longley</span><span class="o">.</span><span class="n">load</span><span class="p">()</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Let's assume that the data is heteroskedastic and that we know the nature of the heteroskedasticity. We can then define <code>sigma</code> and use it to give us a GLS model</p> <p>First we will obtain the residuals from an OLS fit</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">ols_resid</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span><span class="o">.</span><span class="n">resid</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[[ 1.00000000e+00 8.30000000e+01 2.34289000e+05 2.35600000e+03 1.59000000e+03 1.07608000e+05 1.94700000e+03] [ 1.00000000e+00 8.85000000e+01 2.59426000e+05 2.32500000e+03 1.45600000e+03 1.08632000e+05 1.94800000e+03] [ 1.00000000e+00 8.82000000e+01 2.58054000e+05 3.68200000e+03 1.61600000e+03 1.09773000e+05 1.94900000e+03] [ 1.00000000e+00 8.95000000e+01 2.84599000e+05 3.35100000e+03 1.65000000e+03 1.10929000e+05 1.95000000e+03] [ 1.00000000e+00 9.62000000e+01 3.28975000e+05 2.09900000e+03 3.09900000e+03 1.12075000e+05 1.95100000e+03]] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Assume that the error terms follow an AR(1) process with a trend:</p> <p>$\epsilon_i = \beta_0 + \rho\epsilon_{i-1} + \eta_i$</p> <p>where $\eta \sim N(0,\Sigma^2)$</p> <p>and that $\rho$ is simply the correlation of the residual a consistent estimator for rho is to regress the residuals on the lagged residuals</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">resid_fit</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">ols_resid</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">ols_resid</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">resid_fit</span><span class="o">.</span><span class="n">tvalues</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="nb">print</span><span class="p">(</span><span class="n">resid_fit</span><span class="o">.</span><span class="n">pvalues</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>While we don't have strong evidence that the errors follow an AR(1) process we continue</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">rho</span> <span class="o">=</span> <span class="n">resid_fit</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>-1.43902298397 0.173784447888 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>As we know, an AR(1) process means that near-neighbors have a stronger relation so we can give this structure by using a toeplitz matrix</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">scipy.linalg</span> <span class="k">import</span> <span class="n">toeplitz</span> <span class="n">toeplitz</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">order</span> <span class="o">=</span> <span class="n">toeplitz</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">ols_resid</span><span class="p">)))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>so that our error covariance structure is actually rho**order which defines an autocorrelation structure</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sigma</span> <span class="o">=</span> <span class="n">rho</span><span class="o">**</span><span class="n">order</span> <span class="n">gls_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">GLS</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma</span><span class="p">)</span> <span class="n">gls_results</span> <span class="o">=</span> <span class="n">gls_model</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support.</p> <p>We can use the GLSAR model with one lag, to get to a similar result:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">glsar_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">GLSAR</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">exog</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">glsar_results</span> <span class="o">=</span> <span class="n">glsar_model</span><span class="o">.</span><span class="n">iterative_fit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">glsar_results</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Comparing gls and glsar results, we see that there are some small differences in the parameter estimates and the resulting standard errors of the parameter estimate. This might be do to the numerical differences in the algorithm, e.g. the treatment of initial conditions, because of the small number of observations in the longley dataset.</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">gls_results</span><span class="o">.</span><span class="n">params</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">glsar_results</span><span class="o">.</span><span class="n">params</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">gls_results</span><span class="o">.</span><span class="n">bse</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">glsar_results</span><span class="o">.</span><span class="n">bse</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> GLSAR Regression Results ============================================================================== Dep. Variable: y R-squared: 0.996 Model: GLSAR Adj. R-squared: 0.992 Method: Least Squares F-statistic: 295.2 Date: Mon, 20 Jul 2015 Prob (F-statistic): 6.09e-09 Time: 17:43:38 Log-Likelihood: -102.04 No. Observations: 15 AIC: 218.1 Df Residuals: 8 BIC: 223.0 Df Model: 6 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const -3.468e+06 8.72e+05 -3.979 0.004 -5.48e+06 -1.46e+06 x1 34.5568 84.734 0.408 0.694 -160.840 229.953 x2 -0.0343 0.033 -1.047 0.326 -0.110 0.041 x3 -1.9621 0.481 -4.083 0.004 -3.070 -0.854 x4 -1.0020 0.211 -4.740 0.001 -1.489 -0.515 x5 -0.0978 0.225 -0.435 0.675 -0.616 0.421 x6 1823.1829 445.829 4.089 0.003 795.100 2851.266 ============================================================================== Omnibus: 1.960 Durbin-Watson: 2.554 Prob(Omnibus): 0.375 Jarque-Bera (JB): 1.423 Skew: 0.713 Prob(JB): 0.491 Kurtosis: 2.508 Cond. No. 4.80e+09 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 4.8e+09. This might indicate that there are strong multicollinearity or other numerical problems. </pre> </div> </div> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stderr output_text"> <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/scipy/stats/stats.py:1233: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=15 int(n)) </pre> </div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>