Generalized Linear Models (Formula)
=====================================


.. _glm_formula_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/glm_formula.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>This notebook illustrates how you can use R-style formulas to fit Generalized Linear Models.</p>
   <p>To begin, we load the <code>Star98</code> dataset and we construct a formula and pre-process the data:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span>
   <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span>
   <span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="k">as</span> <span class="nn">smf</span>
   <span class="n">star98</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">load_pandas</span><span class="p">()</span><span class="o">.</span><span class="n">data</span>
   <span class="n">formula</span> <span class="o">=</span> <span class="s">&#39;SUCCESS ~ LOWINC + PERASIAN + PERBLACK + PERHISP + PCTCHRT + </span><span class="se">\</span>
   <span class="s">           PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF&#39;</span>
   <span class="n">dta</span> <span class="o">=</span> <span class="n">star98</span><span class="p">[[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">,</span> <span class="s">&#39;NBELOW&#39;</span><span class="p">,</span> <span class="s">&#39;LOWINC&#39;</span><span class="p">,</span> <span class="s">&#39;PERASIAN&#39;</span><span class="p">,</span> <span class="s">&#39;PERBLACK&#39;</span><span class="p">,</span> <span class="s">&#39;PERHISP&#39;</span><span class="p">,</span>
                 <span class="s">&#39;PCTCHRT&#39;</span><span class="p">,</span> <span class="s">&#39;PCTYRRND&#39;</span><span class="p">,</span> <span class="s">&#39;PERMINTE&#39;</span><span class="p">,</span> <span class="s">&#39;AVYRSEXP&#39;</span><span class="p">,</span> <span class="s">&#39;AVSALK&#39;</span><span class="p">,</span>
                 <span class="s">&#39;PERSPENK&#39;</span><span class="p">,</span> <span class="s">&#39;PTRATIO&#39;</span><span class="p">,</span> <span class="s">&#39;PCTAF&#39;</span><span class="p">]]</span>
   <span class="n">endog</span> <span class="o">=</span> <span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span> <span class="o">/</span> <span class="p">(</span><span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span> <span class="o">+</span> <span class="n">dta</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="s">&#39;NBELOW&#39;</span><span class="p">))</span>
   <span class="k">del</span> <span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span>
   <span class="n">dta</span><span class="p">[</span><span class="s">&#39;SUCCESS&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">endog</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Then, we fit the GLM model:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">mod1</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">glm</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="n">formula</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dta</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">mod1</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stderr output_text">
   <pre>/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/IPython/kernel/__main__.py:11: SettingWithCopyWarning: 
   A value is trying to be set on a copy of a slice from a DataFrame.
   Try using .loc[row_indexer,col_indexer] = value instead
   
   See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Finally, we define a function to operate customized data transformation using the formula framework:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="k">def</span> <span class="nf">double_it</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
       <span class="k">return</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">x</span>
   <span class="n">formula</span> <span class="o">=</span> <span class="s">&#39;SUCCESS ~ double_it(LOWINC) + PERASIAN + PERBLACK + PERHISP + PCTCHRT + </span><span class="se">\</span>
   <span class="s">           PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF&#39;</span>
   <span class="n">mod2</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">glm</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="n">formula</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dta</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">mod2</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>As expected, the coefficient for <code>double_it(LOWINC)</code> in the second model is half the size of the <code>LOWINC</code> coefficient from the first model:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">mod1</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">mod2</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>