M-Estimators for Robust Linear Modeling ========================================= .. _robust_models_1_notebook: `Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/robust_models_1.ipynb>`_ .. raw:: html <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span> <span class="kn">from</span> <span class="nn">statsmodels.compat</span> <span class="k">import</span> <span class="n">lmap</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">scipy</span> <span class="k">import</span> <span class="n">stats</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span> <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>An M-estimator minimizes the function </li> </ul> $$Q(e_i, \rho) = \sum_i~\rho \left (\frac{e_i}{s}\right )$$<p>where $\rho$ is a symmetric function of the residuals</p> <ul> <li>The effect of $\rho$ is to reduce the influence of outliers</li> <li>$s$ is an estimate of scale. </li> <li>The robust estimates $\hat{\beta}$ are computed by the iteratively re-weighted least squares algorithm</li> </ul> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>We have several choices available for the weighting functions to be used</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">norms</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">norms</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="k">def</span> <span class="nf">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">weights_func</span><span class="p">,</span> <span class="n">xlabels</span><span class="p">,</span> <span class="n">xticks</span><span class="p">):</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">weights_func</span><span class="p">(</span><span class="n">support</span><span class="p">))</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="n">xticks</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_xticklabels</span><span class="p">(</span><span class="n">xlabels</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="o">-.</span><span class="mi">1</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">)</span> <span class="k">return</span> <span class="n">ax</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Andrew's-Wave">Andrew's Wave<a class="anchor-link" href="#Andrew's-Wave">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">AndrewWave</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">a</span> <span class="o">=</span> <span class="mf">1.339</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <span class="n">andrew</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">AndrewWave</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="n">a</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">andrew</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'$-\pi*a$'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'$\pi*a$'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="o">*</span><span class="n">a</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Andrew's wave weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = sin(z/a)/(z/a) for \|z\| <= a*pi weights(z) = 0 for \|z\| > a*pi </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Hampel's-17A">Hampel's 17A<a class="anchor-link" href="#Hampel's-17A">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">Hampel</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">c</span> <span class="o">=</span> <span class="mi">8</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">hampel</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">Hampel</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mf">2.</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mf">4.</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">c</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">hampel</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'3*c'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3*c'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Hampel weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = 1 for \|z\| <= a weights(z) = a/\|z\| for a < \|z\| <= b weights(z) = a*(c - \|z\|)/(\|z\|*(c-b)) for b < \|z\| <= c weights(z) = 0 for \|z\| > c </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Huber's-t">Huber's t<a class="anchor-link" href="#Huber's-t">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">HuberT</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">t</span> <span class="o">=</span> <span class="mf">1.345</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">huber</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">HuberT</span><span class="p">(</span><span class="n">t</span><span class="o">=</span><span class="n">t</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">huber</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'-3*t'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3*t'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">t</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Huber's t weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = 1 for \|z\| <= t weights(z) = t/\|z\| for \|z\| > t </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Least-Squares">Least Squares<a class="anchor-link" href="#Least-Squares">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">LeastSquares</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">lst_sq</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">LeastSquares</span><span class="p">()</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">lst_sq</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'-3'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) The least squares estimator weighting function for the IRLS algorithm. The psi function scaled by the input z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = np.ones(z.shape) </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Ramsay's-Ea">Ramsay's Ea<a class="anchor-link" href="#Ramsay's-Ea">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">RamsayE</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">a</span> <span class="o">=</span> <span class="o">.</span><span class="mi">3</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">ramsay</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">RamsayE</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="n">a</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">ramsay</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'-3*a'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3*a'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">a</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Ramsay's Ea weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = exp(-a*\|z\|) </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Trimmed-Mean">Trimmed Mean<a class="anchor-link" href="#Trimmed-Mean">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">TrimmedMean</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">c</span> <span class="o">=</span> <span class="mi">2</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">trimmed</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">TrimmedMean</span><span class="p">(</span><span class="n">c</span><span class="o">=</span><span class="n">c</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">trimmed</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'-3*c'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3*c'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Least trimmed mean weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array weights(z) = 1 for \|z\| <= c weights(z) = 0 for \|z\| > c </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Tukey's-Biweight">Tukey's Biweight<a class="anchor-link" href="#Tukey's-Biweight">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">help</span><span class="p">(</span><span class="n">norms</span><span class="o">.</span><span class="n">TukeyBiweight</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">c</span> <span class="o">=</span> <span class="mf">4.685</span> <span class="n">support</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span> <span class="n">tukey</span> <span class="o">=</span> <span class="n">norms</span><span class="o">.</span><span class="n">TukeyBiweight</span><span class="p">(</span><span class="n">c</span><span class="o">=</span><span class="n">c</span><span class="p">)</span> <span class="n">plot_weights</span><span class="p">(</span><span class="n">support</span><span class="p">,</span> <span class="n">tukey</span><span class="o">.</span><span class="n">weights</span><span class="p">,</span> <span class="p">[</span><span class="s">'-3*c'</span><span class="p">,</span> <span class="s">'0'</span><span class="p">,</span> <span class="s">'3*c'</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">c</span><span class="p">]);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Help on function weights in module statsmodels.robust.norms: weights(self, z) Tukey's biweight weighting function for the IRLS algorithm The psi function scaled by z Parameters ---------- z : array-like 1d array Returns ------- weights : array psi(z) = (1 - (z/c)**2)**2 for \|z\| <= R psi(z) = 0 for \|z\| > R </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Scale-Estimators">Scale Estimators<a class="anchor-link" href="#Scale-Estimators">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>Robust estimates of the location</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">500</span><span class="p">])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>The mean is not a robust estimator of location</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">x</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>The median, on the other hand, is a robust estimator with a breakdown point of 50%</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">median</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>Analagously for the scale</li> <li>The standard deviation is not robust</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">x</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Median Absolute Deviation</p> $$ median_i |X_i - median_j(X_j)|) $$ </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Standardized Median Absolute Deviation is a consistent estimator for $\hat{\sigma}$</p> $$\hat{\sigma}=K \cdot MAD$$<p>where $K$ depends on the distribution. For the normal distribution for example,</p> $$K = \Phi^{-1}(.75)$$ </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">stats</span><span class="o">.</span><span class="n">norm</span><span class="o">.</span><span class="n">ppf</span><span class="p">(</span><span class="o">.</span><span class="mi">75</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">scale</span><span class="o">.</span><span class="n">stand_mad</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>[ 1 2 3 4 500] </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mf">5.</span><span class="p">])</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stderr output_text"> <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/robust/scale.py:49: FutureWarning: stand_mad is deprecated and will be removed in 0.7.0. Use mad instead. "instead.", FutureWarning) </pre> </div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>The default for Robust Linear Models is MAD</li> <li>another popular choice is Huber's proposal 2</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span> <span class="n">fat_tails</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">t</span><span class="p">(</span><span class="mi">6</span><span class="p">)</span><span class="o">.</span><span class="n">rvs</span><span class="p">(</span><span class="mi">40</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">kde</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">nonparametric</span><span class="o">.</span><span class="n">KDE</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">)</span> <span class="n">kde</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">kde</span><span class="o">.</span><span class="n">support</span><span class="p">,</span> <span class="n">kde</span><span class="o">.</span><span class="n">density</span><span class="p">);</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">fat_tails</span><span class="o">.</span><span class="n">mean</span><span class="p">(),</span> <span class="n">fat_tails</span><span class="o">.</span><span class="n">std</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">stats</span><span class="o">.</span><span class="n">norm</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>(0.068823104481087499, 1.3471633229698652) </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">stats</span><span class="o">.</span><span class="n">t</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">,</span> <span class="n">f0</span><span class="o">=</span><span class="mi">6</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>(6, 0.039009187170278181, 1.0564230978488927) </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">huber</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">scale</span><span class="o">.</span><span class="n">Huber</span><span class="p">()</span> <span class="n">loc</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="n">huber</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">loc</span><span class="p">,</span> <span class="n">scale</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>0.04048984333271795 1.1557140047569665 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">stand_mad</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stderr output_text"> <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/robust/scale.py:49: FutureWarning: stand_mad is deprecated and will be removed in 0.7.0. Use mad instead. "instead.", FutureWarning) </pre> </div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">stand_mad</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">stats</span><span class="o">.</span><span class="n">t</span><span class="p">(</span><span class="mi">6</span><span class="p">)</span><span class="o">.</span><span class="n">ppf</span><span class="p">(</span><span class="o">.</span><span class="mi">75</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stderr output_text"> <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/robust/scale.py:49: FutureWarning: stand_mad is deprecated and will be removed in 0.7.0. Use mad instead. "instead.", FutureWarning) </pre> </div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">scale</span><span class="o">.</span><span class="n">mad</span><span class="p">(</span><span class="n">fat_tails</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Duncan's-Occupational-Prestige-data---M-estimation-for-outliers">Duncan's Occupational Prestige data - M-estimation for outliers<a class="anchor-link" href="#Duncan's-Occupational-Prestige-data---M-estimation-for-outliers">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.graphics.api</span> <span class="k">import</span> <span class="n">abline_plot</span> <span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="k">import</span> <span class="n">ols</span><span class="p">,</span> <span class="n">rlm</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">prestige</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">get_rdataset</span><span class="p">(</span><span class="s">"Duncan"</span><span class="p">,</span> <span class="s">"car"</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="k">True</span><span class="p">)</span><span class="o">.</span><span class="n">data</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">prestige</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> type income education prestige accountant prof 62 86 82 pilot prof 72 76 83 architect prof 75 92 90 author prof 55 90 76 chemist prof 64 86 90 minister prof 21 84 87 professor prof 64 93 93 dentist prof 80 100 90 reporter wc 67 87 52 engineer prof 72 86 88 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">12</span><span class="p">))</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">211</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">'Income'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Prestige'</span><span class="p">)</span> <span class="n">ax1</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">prestige</span><span class="o">.</span><span class="n">income</span><span class="p">,</span> <span class="n">prestige</span><span class="o">.</span><span class="n">prestige</span><span class="p">)</span> <span class="n">xy_outlier</span> <span class="o">=</span> <span class="n">prestige</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="s">'minister'</span><span class="p">][[</span><span class="s">'income'</span><span class="p">,</span><span class="s">'prestige'</span><span class="p">]]</span> <span class="n">ax1</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="s">'Minister'</span><span class="p">,</span> <span class="n">xy_outlier</span><span class="p">,</span> <span class="n">xy_outlier</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span> <span class="n">ax2</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">212</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">'Education'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Prestige'</span><span class="p">)</span> <span class="n">ax2</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">prestige</span><span class="o">.</span><span class="n">education</span><span class="p">,</span> <span class="n">prestige</span><span class="o">.</span><span class="n">prestige</span><span class="p">);</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">ols_model</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="s">'prestige ~ income + education'</span><span class="p">,</span> <span class="n">prestige</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">ols_model</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> OLS Regression Results ============================================================================== Dep. Variable: prestige R-squared: 0.828 Model: OLS Adj. R-squared: 0.820 Method: Least Squares F-statistic: 101.2 Date: Mon, 20 Jul 2015 Prob (F-statistic): 8.65e-17 Time: 17:44:43 Log-Likelihood: -178.98 No. Observations: 45 AIC: 364.0 Df Residuals: 42 BIC: 369.4 Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept -6.0647 4.272 -1.420 0.163 -14.686 2.556 income 0.5987 0.120 5.003 0.000 0.357 0.840 education 0.5458 0.098 5.555 0.000 0.348 0.744 ============================================================================== Omnibus: 1.279 Durbin-Watson: 1.458 Prob(Omnibus): 0.528 Jarque-Bera (JB): 0.520 Skew: 0.155 Prob(JB): 0.771 Kurtosis: 3.426 Cond. No. 163. ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">infl</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">get_influence</span><span class="p">()</span> <span class="n">student</span> <span class="o">=</span> <span class="n">infl</span><span class="o">.</span><span class="n">summary_frame</span><span class="p">()[</span><span class="s">'student_resid'</span><span class="p">]</span> <span class="nb">print</span><span class="p">(</span><span class="n">student</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>accountant 0.303900 pilot 0.340920 architect 0.072256 author 0.000711 chemist 0.826578 minister 3.134519 professor 0.768277 dentist -0.498082 reporter -2.397022 engineer 0.306225 undertaker -0.187339 lawyer -0.303082 physician 0.355687 welfare.worker -0.411406 teacher 0.050510 conductor -1.704032 contractor 2.043805 factory.owner 1.602429 store.manager 0.142425 banker 0.508388 bookkeeper -0.902388 mail.carrier -1.433249 insurance.agent -1.930919 store.clerk -1.760491 carpenter 1.068858 electrician 0.731949 RR.engineer 0.808922 machinist 1.887047 auto.repairman 0.522735 plumber -0.377954 gas.stn.attendant -0.666596 coal.miner 1.018527 streetcar.motorman -1.104485 taxi.driver 0.023322 truck.driver -0.129227 machine.operator 0.499922 barber 0.173805 bartender -0.902422 shoe.shiner -0.429357 cook 0.127207 soda.clerk -0.883095 watchman -0.513502 janitor -0.079890 policeman 0.078847 waiter -0.475972 Name: student_resid, dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">student</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">student</span><span class="p">)</span> <span class="o">></span> <span class="mi">2</span><span class="p">])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>minister 3.134519 reporter -2.397022 contractor 2.043805 Name: student_resid, dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">infl</span><span class="o">.</span><span class="n">summary_frame</span><span class="p">()</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="s">'minister'</span><span class="p">])</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>dfb_Intercept 0.144937 dfb_income -1.220939 dfb_education 1.263019 cooks_d 0.566380 dffits 1.433935 dffits_internal 1.303510 hat_diag 0.173058 standard_resid 2.849416 student_resid 3.134519 Name: minister, dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sidak</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">outlier_test</span><span class="p">(</span><span class="s">'sidak'</span><span class="p">)</span> <span class="n">sidak</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s">'unadj_p'</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">sidak</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> student_resid unadj_p sidak(p) minister 3.134519 0.003177 0.133421 reporter -2.397022 0.021170 0.618213 contractor 2.043805 0.047433 0.887721 insurance.agent -1.930919 0.060428 0.939485 machinist 1.887047 0.066248 0.954247 store.clerk -1.760491 0.085783 0.982331 conductor -1.704032 0.095944 0.989315 factory.owner 1.602429 0.116738 0.996250 mail.carrier -1.433249 0.159369 0.999595 streetcar.motorman -1.104485 0.275823 1.000000 carpenter 1.068858 0.291386 1.000000 coal.miner 1.018527 0.314400 1.000000 bartender -0.902422 0.372104 1.000000 bookkeeper -0.902388 0.372122 1.000000 soda.clerk -0.883095 0.382334 1.000000 chemist 0.826578 0.413261 1.000000 RR.engineer 0.808922 0.423229 1.000000 professor 0.768277 0.446725 1.000000 electrician 0.731949 0.468363 1.000000 gas.stn.attendant -0.666596 0.508764 1.000000 auto.repairman 0.522735 0.603972 1.000000 watchman -0.513502 0.610357 1.000000 banker 0.508388 0.613906 1.000000 machine.operator 0.499922 0.619802 1.000000 dentist -0.498082 0.621088 1.000000 waiter -0.475972 0.636621 1.000000 shoe.shiner -0.429357 0.669912 1.000000 welfare.worker -0.411406 0.682918 1.000000 plumber -0.377954 0.707414 1.000000 physician 0.355687 0.723898 1.000000 pilot 0.340920 0.734905 1.000000 engineer 0.306225 0.760983 1.000000 accountant 0.303900 0.762741 1.000000 lawyer -0.303082 0.763360 1.000000 undertaker -0.187339 0.852319 1.000000 barber 0.173805 0.862874 1.000000 store.manager 0.142425 0.887442 1.000000 truck.driver -0.129227 0.897810 1.000000 cook 0.127207 0.899399 1.000000 janitor -0.079890 0.936713 1.000000 policeman 0.078847 0.937538 1.000000 architect 0.072256 0.942750 1.000000 teacher 0.050510 0.959961 1.000000 taxi.driver 0.023322 0.981507 1.000000 author 0.000711 0.999436 1.000000 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fdr</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">outlier_test</span><span class="p">(</span><span class="s">'fdr_bh'</span><span class="p">)</span> <span class="n">fdr</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s">'unadj_p'</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">fdr</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> student_resid unadj_p fdr_bh(p) minister 3.134519 0.003177 0.142974 reporter -2.397022 0.021170 0.476332 contractor 2.043805 0.047433 0.596233 insurance.agent -1.930919 0.060428 0.596233 machinist 1.887047 0.066248 0.596233 store.clerk -1.760491 0.085783 0.616782 conductor -1.704032 0.095944 0.616782 factory.owner 1.602429 0.116738 0.656653 mail.carrier -1.433249 0.159369 0.796844 streetcar.motorman -1.104485 0.275823 0.999436 carpenter 1.068858 0.291386 0.999436 coal.miner 1.018527 0.314400 0.999436 bartender -0.902422 0.372104 0.999436 bookkeeper -0.902388 0.372122 0.999436 soda.clerk -0.883095 0.382334 0.999436 chemist 0.826578 0.413261 0.999436 RR.engineer 0.808922 0.423229 0.999436 professor 0.768277 0.446725 0.999436 electrician 0.731949 0.468363 0.999436 gas.stn.attendant -0.666596 0.508764 0.999436 auto.repairman 0.522735 0.603972 0.999436 watchman -0.513502 0.610357 0.999436 banker 0.508388 0.613906 0.999436 machine.operator 0.499922 0.619802 0.999436 dentist -0.498082 0.621088 0.999436 waiter -0.475972 0.636621 0.999436 shoe.shiner -0.429357 0.669912 0.999436 welfare.worker -0.411406 0.682918 0.999436 plumber -0.377954 0.707414 0.999436 physician 0.355687 0.723898 0.999436 pilot 0.340920 0.734905 0.999436 engineer 0.306225 0.760983 0.999436 accountant 0.303900 0.762741 0.999436 lawyer -0.303082 0.763360 0.999436 undertaker -0.187339 0.852319 0.999436 barber 0.173805 0.862874 0.999436 store.manager 0.142425 0.887442 0.999436 truck.driver -0.129227 0.897810 0.999436 cook 0.127207 0.899399 0.999436 janitor -0.079890 0.936713 0.999436 policeman 0.078847 0.937538 0.999436 architect 0.072256 0.942750 0.999436 teacher 0.050510 0.959961 0.999436 taxi.driver 0.023322 0.981507 0.999436 author 0.000711 0.999436 0.999436 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">rlm_model</span> <span class="o">=</span> <span class="n">rlm</span><span class="p">(</span><span class="s">'prestige ~ income + education'</span><span class="p">,</span> <span class="n">prestige</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">rlm_model</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> Robust linear Model Regression Results ============================================================================== Dep. Variable: prestige No. Observations: 45 Model: RLM Df Residuals: 42 Method: IRLS Df Model: 2 Norm: HuberT Scale Est.: mad Cov Type: H1 Date: Mon, 20 Jul 2015 Time: 17:44:43 No. Iterations: 18 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept -7.1107 3.879 -1.833 0.067 -14.713 0.492 income 0.7015 0.109 6.456 0.000 0.489 0.914 education 0.4854 0.089 5.441 0.000 0.311 0.660 ============================================================================== If the model instance has been used for another fit with different fit parameters, then the fit options might not be the correct ones anymore . </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">rlm_model</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>accountant 1.000000 pilot 1.000000 architect 1.000000 author 1.000000 chemist 1.000000 minister 0.344596 professor 1.000000 dentist 1.000000 reporter 0.441669 engineer 1.000000 undertaker 1.000000 lawyer 1.000000 physician 1.000000 welfare.worker 1.000000 teacher 1.000000 conductor 0.538445 contractor 0.552262 factory.owner 0.706169 store.manager 1.000000 banker 1.000000 bookkeeper 1.000000 mail.carrier 0.690764 insurance.agent 0.533499 store.clerk 0.618656 carpenter 0.935848 electrician 1.000000 RR.engineer 1.000000 machinist 0.570360 auto.repairman 1.000000 plumber 1.000000 gas.stn.attendant 1.000000 coal.miner 0.963821 streetcar.motorman 0.832870 taxi.driver 1.000000 truck.driver 1.000000 machine.operator 1.000000 barber 1.000000 bartender 1.000000 shoe.shiner 1.000000 cook 1.000000 soda.clerk 1.000000 watchman 1.000000 janitor 1.000000 policeman 1.000000 waiter 1.000000 dtype: float64 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Hertzprung-Russell-data-for-Star-Cluster-CYG-0B1---Leverage-Points">Hertzprung Russell data for Star Cluster CYG 0B1 - Leverage Points<a class="anchor-link" href="#Hertzprung-Russell-data-for-Star-Cluster-CYG-0B1---Leverage-Points">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>Data is on the luminosity and temperature of 47 stars in the direction of Cygnus.</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">get_rdataset</span><span class="p">(</span><span class="s">"starsCYG"</span><span class="p">,</span> <span class="s">"robustbase"</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="k">True</span><span class="p">)</span><span class="o">.</span><span class="n">data</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">matplotlib.patches</span> <span class="k">import</span> <span class="n">Ellipse</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">'log(Temp)'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'log(Light)'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Hertzsprung-Russell Diagram of Star Cluster CYG OB1'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="o">*</span><span class="n">dta</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">T</span><span class="p">)</span> <span class="c"># highlight outliers</span> <span class="n">e</span> <span class="o">=</span> <span class="n">Ellipse</span><span class="p">((</span><span class="mf">3.5</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="o">.</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=.</span><span class="mi">25</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'r'</span><span class="p">)</span> <span class="n">ax</span><span class="o">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">e</span><span class="p">);</span> <span class="n">ax</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="s">'Red giants'</span><span class="p">,</span> <span class="n">xy</span><span class="o">=</span><span class="p">(</span><span class="mf">3.6</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="n">xytext</span><span class="o">=</span><span class="p">(</span><span class="mf">3.8</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="n">arrowprops</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">facecolor</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">shrink</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">2</span><span class="p">),</span> <span class="n">horizontalalignment</span><span class="o">=</span><span class="s">'left'</span><span class="p">,</span> <span class="n">verticalalignment</span><span class="o">=</span><span class="s">'bottom'</span><span class="p">,</span> <span class="n">clip_on</span><span class="o">=</span><span class="k">True</span><span class="p">,</span> <span class="c"># clip to the axes bounding box</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="p">)</span> <span class="c"># annotate these with their index</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">row</span> <span class="ow">in</span> <span class="n">dta</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">dta</span><span class="p">[</span><span class="s">'log.Te'</span><span class="p">]</span> <span class="o"><</span> <span class="mf">3.8</span><span class="p">]</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span> <span class="n">ax</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">row</span><span class="p">,</span> <span class="n">row</span> <span class="o">+</span> <span class="o">.</span><span class="mi">01</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">)</span> <span class="n">xlim</span><span class="p">,</span> <span class="n">ylim</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">get_xlim</span><span class="p">(),</span> <span class="n">ax</span><span class="o">.</span><span class="n">get_ylim</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">IPython.display</span> <span class="k">import</span> <span class="n">Image</span> <span class="n">Image</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">'star_diagram.png'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">y</span> <span class="o">=</span> <span class="n">dta</span><span class="p">[</span><span class="s">'log.light'</span><span class="p">]</span> <span class="n">X</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">dta</span><span class="p">[</span><span class="s">'log.Te'</span><span class="p">],</span> <span class="n">prepend</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="n">ols_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">abline_plot</span><span class="p">(</span><span class="n">model_results</span><span class="o">=</span><span class="n">ols_model</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">rlm_mod</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">RLM</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">sm</span><span class="o">.</span><span class="n">robust</span><span class="o">.</span><span class="n">norms</span><span class="o">.</span><span class="n">TrimmedMean</span><span class="p">(</span><span class="o">.</span><span class="mi">5</span><span class="p">))</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">abline_plot</span><span class="p">(</span><span class="n">model_results</span><span class="o">=</span><span class="n">rlm_mod</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'red'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>Why? Because M-estimators are not robust to leverage points.</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">infl</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">get_influence</span><span class="p">()</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">h_bar</span> <span class="o">=</span> <span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="n">ols_model</span><span class="o">.</span><span class="n">df_model</span> <span class="o">+</span> <span class="mi">1</span> <span class="p">)</span><span class="o">/</span><span class="n">ols_model</span><span class="o">.</span><span class="n">nobs</span> <span class="n">hat_diag</span> <span class="o">=</span> <span class="n">infl</span><span class="o">.</span><span class="n">summary_frame</span><span class="p">()[</span><span class="s">'hat_diag'</span><span class="p">]</span> <span class="n">hat_diag</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">hat_diag</span> <span class="o">></span> <span class="n">h_bar</span><span class="p">]</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">sidak2</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">outlier_test</span><span class="p">(</span><span class="s">'sidak'</span><span class="p">)</span> <span class="n">sidak2</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s">'unadj_p'</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">sidak2</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> student_resid unadj_p sidak(p) 16 -2.049393 0.046415 0.892872 13 -2.035329 0.047868 0.900286 33 1.905847 0.063216 0.953543 18 -1.575505 0.122304 0.997826 1 1.522185 0.135118 0.998911 3 1.522185 0.135118 0.998911 21 -1.450418 0.154034 0.999615 17 -1.426675 0.160731 0.999735 29 1.388520 0.171969 0.999859 14 -1.374733 0.176175 0.999889 35 1.346543 0.185023 0.999933 34 -1.272159 0.209999 0.999985 28 -1.186946 0.241618 0.999998 20 -1.150621 0.256103 0.999999 44 1.134779 0.262612 0.999999 39 1.091886 0.280826 1.000000 19 1.064878 0.292740 1.000000 6 -1.026873 0.310093 1.000000 30 -1.009096 0.318446 1.000000 22 -0.979768 0.332557 1.000000 8 0.961218 0.341695 1.000000 5 0.913802 0.365801 1.000000 11 0.871997 0.387943 1.000000 12 0.856685 0.396261 1.000000 46 -0.833923 0.408829 1.000000 10 0.743920 0.460879 1.000000 42 0.727179 0.470968 1.000000 15 -0.689258 0.494280 1.000000 43 0.688272 0.494895 1.000000 7 0.655712 0.515424 1.000000 40 -0.646396 0.521381 1.000000 26 -0.640978 0.524862 1.000000 25 -0.545561 0.588123 1.000000 37 0.472819 0.638680 1.000000 32 0.472819 0.638680 1.000000 38 0.462187 0.646225 1.000000 0 0.430686 0.668799 1.000000 31 0.341726 0.734184 1.000000 36 0.318911 0.751303 1.000000 4 0.307890 0.759619 1.000000 9 0.235114 0.815211 1.000000 41 0.187732 0.851950 1.000000 2 -0.182093 0.856346 1.000000 23 -0.156014 0.876736 1.000000 27 -0.147406 0.883485 1.000000 24 0.065195 0.948314 1.000000 45 0.045675 0.963776 1.000000 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">fdr2</span> <span class="o">=</span> <span class="n">ols_model</span><span class="o">.</span><span class="n">outlier_test</span><span class="p">(</span><span class="s">'fdr_bh'</span><span class="p">)</span> <span class="n">fdr2</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s">'unadj_p'</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">fdr2</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre> student_resid unadj_p fdr_bh(p) 16 -2.049393 0.046415 0.764747 13 -2.035329 0.047868 0.764747 33 1.905847 0.063216 0.764747 18 -1.575505 0.122304 0.764747 1 1.522185 0.135118 0.764747 3 1.522185 0.135118 0.764747 21 -1.450418 0.154034 0.764747 17 -1.426675 0.160731 0.764747 29 1.388520 0.171969 0.764747 14 -1.374733 0.176175 0.764747 35 1.346543 0.185023 0.764747 34 -1.272159 0.209999 0.764747 28 -1.186946 0.241618 0.764747 20 -1.150621 0.256103 0.764747 44 1.134779 0.262612 0.764747 39 1.091886 0.280826 0.764747 19 1.064878 0.292740 0.764747 6 -1.026873 0.310093 0.764747 30 -1.009096 0.318446 0.764747 22 -0.979768 0.332557 0.764747 8 0.961218 0.341695 0.764747 5 0.913802 0.365801 0.768599 11 0.871997 0.387943 0.768599 12 0.856685 0.396261 0.768599 46 -0.833923 0.408829 0.768599 10 0.743920 0.460879 0.770890 42 0.727179 0.470968 0.770890 15 -0.689258 0.494280 0.770890 43 0.688272 0.494895 0.770890 7 0.655712 0.515424 0.770890 40 -0.646396 0.521381 0.770890 26 -0.640978 0.524862 0.770890 25 -0.545561 0.588123 0.837630 37 0.472819 0.638680 0.843682 32 0.472819 0.638680 0.843682 38 0.462187 0.646225 0.843682 0 0.430686 0.668799 0.849556 31 0.341726 0.734184 0.892552 36 0.318911 0.751303 0.892552 4 0.307890 0.759619 0.892552 9 0.235114 0.815211 0.922751 41 0.187732 0.851950 0.922751 2 -0.182093 0.856346 0.922751 23 -0.156014 0.876736 0.922751 27 -0.147406 0.883485 0.922751 24 0.065195 0.948314 0.963776 45 0.045675 0.963776 0.963776 </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>Let's delete that line</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="k">del</span> <span class="n">ax</span><span class="o">.</span><span class="n">lines</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">weights</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">))</span> <span class="n">weights</span><span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="s">'log.Te'</span><span class="p">]</span> <span class="o"><</span> <span class="mf">3.8</span><span class="p">]</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">values</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">wls_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">WLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">weights</span><span class="o">=</span><span class="n">weights</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="n">abline_plot</span><span class="p">(</span><span class="n">model_results</span><span class="o">=</span><span class="n">wls_model</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'green'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <ul> <li>MM estimators are good for this type of problem, unfortunately, we don't yet have these yet. </li> <li>It's being worked on, but it gives a good excuse to look at the R cell magics in the notebook.</li> </ul> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">yy</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">values</span><span class="p">[:,</span><span class="k">None</span><span class="p">]</span> <span class="n">xx</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="s">'log.Te'</span><span class="p">]</span><span class="o">.</span><span class="n">values</span><span class="p">[:,</span><span class="k">None</span><span class="p">]</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="o">%</span><span class="k">load_ext</span> rmagic <span class="o">%</span><span class="k">R</span> library(robustbase) <span class="o">%</span><span class="k">Rpush</span> yy xx <span class="o">%</span><span class="k">R</span> mod <- lmrob(yy ~ xx); <span class="o">%</span><span class="k">R</span> params <- mod$coefficients; <span class="o">%</span><span class="k">Rpull</span> params </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Error in library(robustbase) : there is no package called ‘robustbase’ Error in withVisible({ : could not find function "lmrob" Error in withVisible({ : object 'mod' not found </pre> </div> </div> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stderr output_text"> <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/IPython/extensions/rmagic.py:693: UserWarning: The rmagic extension in IPython is deprecated in favour of rpy2.ipython. If available, that will be loaded instead. http://rpy.sourceforge.net/ warnings.warn("The rmagic extension in IPython is deprecated in favour of " </pre> </div> </div> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="o">%</span><span class="k">R</span> print(mod) </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>Error in eval(expr, envir, enclos) : object 'params' not found Error in print(mod) : object 'mod' not found </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">params</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">abline_plot</span><span class="p">(</span><span class="n">intercept</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">slope</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'green'</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h3 id="Exercise:-Breakdown-points-of-M-estimator">Exercise: Breakdown points of M-estimator<a class="anchor-link" href="#Exercise:-Breakdown-points-of-M-estimator">¶</a></h3> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span> <span class="n">nobs</span> <span class="o">=</span> <span class="mi">200</span> <span class="n">beta_true</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">4</span><span class="p">])</span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">20</span><span class="p">,</span><span class="mi">20</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">nobs</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">beta_true</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span> <span class="c"># stack a constant in front</span> <span class="n">X</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">prepend</span><span class="o">=</span><span class="k">True</span><span class="p">)</span> <span class="c"># np.c_[np.ones(nobs), X]</span> <span class="n">mc_iter</span> <span class="o">=</span> <span class="mi">500</span> <span class="n">contaminate</span> <span class="o">=</span> <span class="o">.</span><span class="mi">25</span> <span class="c"># percentage of response variables to contaminate</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">all_betas</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">mc_iter</span><span class="p">):</span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">beta_true</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">200</span><span class="p">)</span> <span class="n">random_idx</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">nobs</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">contaminate</span> <span class="o">*</span> <span class="n">nobs</span><span class="p">))</span> <span class="n">y</span><span class="p">[</span><span class="n">random_idx</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">750</span><span class="p">,</span> <span class="mi">750</span><span class="p">)</span> <span class="n">beta_hat</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">RLM</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span><span class="o">.</span><span class="n">params</span> <span class="n">all_betas</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">beta_hat</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">all_betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">all_betas</span><span class="p">)</span> <span class="n">se_loss</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span> <span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="nb">ord</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="n">se_beta</span> <span class="o">=</span> <span class="n">lmap</span><span class="p">(</span><span class="n">se_loss</span><span class="p">,</span> <span class="n">all_betas</span> <span class="o">-</span> <span class="n">beta_true</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <h4 id="Squared-error-loss">Squared error loss<a class="anchor-link" href="#Squared-error-loss">¶</a></h4> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">se_beta</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">all_betas</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">beta_true</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [ ]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"><pre><span class="n">se_loss</span><span class="p">(</span><span class="n">all_betas</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">-</span> <span class="n">beta_true</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"><div class="prompt"></div> </div> </div> </div> </div> <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script> <script type="text/javascript"> init_mathjax = function() { if (window.MathJax) { // MathJax loaded MathJax.Hub.Config({ tex2jax: { // I'm not sure about the \( and \[ below. It messes with the // prompt, and I think it's an issue with the template. -SS inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] }, displayAlign: 'left', // Change this to 'center' to center equations. "HTML-CSS": { styles: {'.MathJax_Display': {"margin": 0}} } }); MathJax.Hub.Queue(["Typeset",MathJax.Hub]); } } init_mathjax(); // since we have to load this in a ..raw:: directive we will add the css // after the fact function loadcssfile(filename){ var fileref=document.createElement("link") fileref.setAttribute("rel", "stylesheet") fileref.setAttribute("type", "text/css") fileref.setAttribute("href", filename) document.getElementsByTagName("head")[0].appendChild(fileref) } // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }}) // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }}) loadcssfile("../../../_static/nbviewer.pygments.css") loadcssfile("../../../_static/ipython.min.css") </script>