Weighted Least Squares ======================== .. _wls_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/wls.ipynb>`_ .. raw:: html <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">scipy</span> <span class="k">import</span> <span class="n">stats</span> <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span> <span class="kn">from</span> <span class="nn">statsmodels.sandbox.regression.predstd</span> <span class="k">import</span> <span class="n">wls_prediction_std</span> <span class="kn">from</span> <span class="nn">statsmodels.iolib.table</span> <span class="k">import</span> <span class="p">(</span><span class="n">SimpleTable</span><span class="p">,</span> <span class="n">default_txt_fmt</span><span class="p">)</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="WLS-Estimation">WLS Estimation<a class="anchor-link" href="#WLS-Estimation">¶</a></h2><h3 id="Artificial-data:-Heteroscedasticity-2-groups">Artificial data: Heteroscedasticity 2 groups<a class="anchor-link" href="#Artificial-data:-Heteroscedasticity-2-groups">¶</a></h3><p>Model assumptions:</p> <ul> <li>Misspecification: true model is quadratic, estimate only linear</li> <li>Independent noise/error term</li> <li>Two groups for error variance, low and high variance groups</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">nsample</span> <span class="o">=</span> <span class="mi">50</span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="n">nsample</span><span class="p">)</span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">column_stack</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">5</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span> <span class="n">X</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">X</span><span class="p">)</span> <span class="n">beta</span> <span class="o">=</span> <span class="p">[</span><span class="mf">5.</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.01</span><span class="p">]</span> <span class="n">sig</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="n">w</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">nsample</span><span class="p">)</span> <span class="n">w</span><span class="p">[</span><span class="n">nsample</span> <span class="o">*</span> <span class="mi">6</span><span class="o">/</span><span class="mi">10</span><span class="p">:]</span> <span class="o">=</span> <span class="mi">3</span> <span class="n">y_true</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span> <span class="n">e</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">nsample</span><span class="p">)</span> <span class="n">y</span> <span class="o">=</span> <span class="n">y_true</span> <span class="o">+</span> <span class="n">sig</span> <span class="o">*</span> <span class="n">w</span> <span class="o">*</span> <span class="n">e</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">]]</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="WLS-knowing-the-true-variance-ratio-of-heteroscedasticity">WLS knowing the true variance ratio of heteroscedasticity<a class="anchor-link" href="#WLS-knowing-the-true-variance-ratio-of-heteroscedasticity">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">mod_wls</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">WLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">weights</span><span class="o">=</span><span class="mf">1.</span><span class="o">/</span><span class="n">w</span><span class="p">)</span> <span class="n">res_wls</span> <span class="o">=</span> <span class="n">mod_wls</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">res_wls</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="OLS-vs.-WLS">OLS vs. WLS<a class="anchor-link" href="#OLS-vs.-WLS">¶</a></h2><p>Estimate an OLS model for comparison:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">res_ols</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">res_ols</span><span class="o">.</span><span class="n">params</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">res_wls</span><span class="o">.</span><span class="n">params</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> WLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.910 Model: WLS Adj. R-squared: 0.909 Method: Least Squares F-statistic: 487.9 Date: Mon, 20 Jul 2015 Prob (F-statistic): 8.52e-27 Time: 17:45:13 Log-Likelihood: -57.048 No. Observations: 50 AIC: 118.1 Df Residuals: 48 BIC: 121.9 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 5.2726 0.185 28.488 0.000 4.900 5.645 x1 0.4379 0.020 22.088 0.000 0.398 0.478 ============================================================================== Omnibus: 5.040 Durbin-Watson: 2.242 Prob(Omnibus): 0.080 Jarque-Bera (JB): 6.431 Skew: 0.024 Prob(JB): 0.0401 Kurtosis: 4.756 Cond. No. 17.0 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Compare the WLS standard errors to heteroscedasticity corrected OLS standard errors:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">se</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">([[</span><span class="n">res_wls</span><span class="o">.</span><span class="n">bse</span><span class="p">],</span> <span class="p">[</span><span class="n">res_ols</span><span class="o">.</span><span class="n">bse</span><span class="p">],</span> <span class="p">[</span><span class="n">res_ols</span><span class="o">.</span><span class="n">HC0_se</span><span class="p">],</span> <span class="p">[</span><span class="n">res_ols</span><span class="o">.</span><span class="n">HC1_se</span><span class="p">],</span> <span class="p">[</span><span class="n">res_ols</span><span class="o">.</span><span class="n">HC2_se</span><span class="p">],</span> <span class="p">[</span><span class="n">res_ols</span><span class="o">.</span><span class="n">HC3_se</span><span class="p">]])</span> <span class="n">se</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">se</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span> <span class="n">colnames</span> <span class="o">=</span> <span class="p">[</span><span class="s">'x1'</span><span class="p">,</span> <span class="s">'const'</span><span class="p">]</span> <span class="n">rownames</span> <span class="o">=</span> <span class="p">[</span><span class="s">'WLS'</span><span class="p">,</span> <span class="s">'OLS'</span><span class="p">,</span> <span class="s">'OLS_HC0'</span><span class="p">,</span> <span class="s">'OLS_HC1'</span><span class="p">,</span> <span class="s">'OLS_HC3'</span><span class="p">,</span> <span class="s">'OLS_HC3'</span><span class="p">]</span> <span class="n">tabl</span> <span class="o">=</span> <span class="n">SimpleTable</span><span class="p">(</span><span class="n">se</span><span class="p">,</span> <span class="n">colnames</span><span class="p">,</span> <span class="n">rownames</span><span class="p">,</span> <span class="n">txt_fmt</span><span class="o">=</span><span class="n">default_txt_fmt</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">tabl</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[ 5.24256099 0.43486879] [ 5.27260714 0.43794441] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Calculate OLS prediction interval:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">covb</span> <span class="o">=</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">cov_params</span><span class="p">()</span> <span class="n">prediction_var</span> <span class="o">=</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">mse_resid</span> <span class="o">+</span> <span class="p">(</span><span class="n">X</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">covb</span><span class="p">,</span><span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">)</span><span class="o">.</span><span class="n">T</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="n">prediction_std</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">prediction_var</span><span class="p">)</span> <span class="n">tppf</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">t</span><span class="o">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.975</span><span class="p">,</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">df_resid</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>===================== x1 const --------------------- WLS 0.1851 0.0198 OLS 0.2707 0.0233 OLS_HC0 0.194 0.0281 OLS_HC1 0.198 0.0287 OLS_HC3 0.2003 0.029 OLS_HC3 0.207 0.03 --------------------- </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">prstd_ols</span><span class="p">,</span> <span class="n">iv_l_ols</span><span class="p">,</span> <span class="n">iv_u_ols</span> <span class="o">=</span> <span class="n">wls_prediction_std</span><span class="p">(</span><span class="n">res_ols</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Draw a plot to compare predicted values in WLS and OLS:</p> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">prstd</span><span class="p">,</span> <span class="n">iv_l</span><span class="p">,</span> <span class="n">iv_u</span> <span class="o">=</span> <span class="n">wls_prediction_std</span><span class="p">(</span><span class="n">res_wls</span><span class="p">)</span> <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">6</span><span class="p">))</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="s">'o'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Data"</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y_true</span><span class="p">,</span> <span class="s">'b-'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"True"</span><span class="p">)</span> <span class="c"># OLS</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">fittedvalues</span><span class="p">,</span> <span class="s">'r--'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">iv_u_ols</span><span class="p">,</span> <span class="s">'r--'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"OLS"</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">iv_l_ols</span><span class="p">,</span> <span class="s">'r--'</span><span class="p">)</span> <span class="c"># WLS</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">res_wls</span><span class="o">.</span><span class="n">fittedvalues</span><span class="p">,</span> <span class="s">'g--.'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">iv_u</span><span class="p">,</span> <span class="s">'g--'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"WLS"</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">iv_l</span><span class="p">,</span> <span class="s">'g--'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s">"best"</span><span class="p">);</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h2 id="Feasible-Weighted-Least-Squares-(2-stage-FWLS)">Feasible Weighted Least Squares (2-stage FWLS)<a class="anchor-link" href="#Feasible-Weighted-Least-Squares-(2-stage-FWLS)">¶</a></h2> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">resid1</span> <span class="o">=</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">resid</span><span class="p">[</span><span class="n">w</span><span class="o">==</span><span class="mf">1.</span><span class="p">]</span> <span class="n">var1</span> <span class="o">=</span> <span class="n">resid1</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="n">ddof</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">res_ols</span><span class="o">.</span><span class="n">df_model</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="n">resid2</span> <span class="o">=</span> <span class="n">res_ols</span><span class="o">.</span><span class="n">resid</span><span class="p">[</span><span class="n">w</span><span class="o">!=</span><span class="mf">1.</span><span class="p">]</span> <span class="n">var2</span> <span class="o">=</span> <span class="n">resid2</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="n">ddof</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">res_ols</span><span class="o">.</span><span class="n">df_model</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="n">w_est</span> <span class="o">=</span> <span class="n">w</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">w_est</span><span class="p">[</span><span class="n">w</span><span class="o">!=</span><span class="mf">1.</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var2</span><span class="p">)</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var1</span><span class="p">)</span> <span class="n">res_fwls</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">WLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="mf">1.</span><span class="o">/</span><span class="n">w_est</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">res_fwls</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>