Formulas: Fitting models using R-style formulas
=================================================


.. _formulas_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/formulas.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Since version 0.5.0, <code>statsmodels</code> allows users to fit statistical models using R-style formulas. Internally, <code>statsmodels</code> uses the <a href="http://patsy.readthedocs.org/">patsy</a> package to convert formulas and data to the matrices that are used in model fitting. The formula framework is quite powerful; this tutorial only scratches the surface. A full description of the formula language can be found in the <code>patsy</code> docs:</p>
   <ul>
   <li><a href="http://patsy.readthedocs.org/">Patsy formula language description</a></li>
   </ul>
   <h2 id="Loading-modules-and-functions">Loading modules and functions<a class="anchor-link" href="#Loading-modules-and-functions">&#182;</a></h2>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">print_function</span>
   <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
   <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="nn">sm</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h4 id="Import-convention">Import convention<a class="anchor-link" href="#Import-convention">&#182;</a></h4>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>You can import explicitly from statsmodels.formula.api</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="k">import</span> <span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Alternatively, you can just use the <code>formula</code> namespace of the main <code>statsmodels.api</code>.</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">formula</span><span class="o">.</span><span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Or you can use the following conventioin</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="k">as</span> <span class="nn">smf</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>These names are just a convenient way to get access to each model's <code>from_formula</code> classmethod. See, for instance</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="o">.</span><span class="n">from_formula</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>All of the lower case models accept <code>formula</code> and <code>data</code> arguments, whereas upper case ones take <code>endog</code> and <code>exog</code> design matrices. <code>formula</code> accepts a string which describes the model in terms of a <code>patsy</code> formula. <code>data</code> takes a <a href="http://pandas.pydata.org/">pandas</a> data frame or any other data structure that defines a <code>__getitem__</code> for variable names like a structured array or a dictionary of variables.</p>
   <p><code>dir(sm.formula)</code> will print a list of available models.</p>
   <p>Formula-compatible models have the following generic call signature: <code>(formula, data, subset=None, *args, **kwargs)</code></p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="OLS-regression-using-formulas">OLS regression using formulas<a class="anchor-link" href="#OLS-regression-using-formulas">&#182;</a></h2><p>To begin, we fit the linear model described on the <a href="gettingstarted.html">Getting Started</a> page. Download the data, subset columns, and list-wise delete to remove missing observations:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">get_rdataset</span><span class="p">(</span><span class="s">&quot;Guerry&quot;</span><span class="p">,</span> <span class="s">&quot;HistData&quot;</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="k">True</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">dta</span><span class="o">.</span><span class="n">data</span><span class="p">[[</span><span class="s">&#39;Lottery&#39;</span><span class="p">,</span> <span class="s">&#39;Literacy&#39;</span><span class="p">,</span> <span class="s">&#39;Wealth&#39;</span><span class="p">,</span> <span class="s">&#39;Region&#39;</span><span class="p">]]</span><span class="o">.</span><span class="n">dropna</span><span class="p">()</span>
   <span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Fit the model:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + Region&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Categorical-variables">Categorical variables<a class="anchor-link" href="#Categorical-variables">&#182;</a></h2><p>Looking at the summary printed above, notice that <code>patsy</code> determined that elements of <em>Region</em> were text strings, so it treated <em>Region</em> as a categorical variable. <code>patsy</code>'s default is also to include an intercept, so we automatically dropped one of the <em>Region</em> categories.</p>
   <p>If <em>Region</em> had been an integer variable that we wanted to treat explicitly as categorical, we could have done so by using the <code>C()</code> operator:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>                            OLS Regression Results                            
   ==============================================================================
   Dep. Variable:                Lottery   R-squared:                       0.338
   Model:                            OLS   Adj. R-squared:                  0.287
   Method:                 Least Squares   F-statistic:                     6.636
   Date:                Mon, 20 Jul 2015   Prob (F-statistic):           1.07e-05
   Time:                        17:43:27   Log-Likelihood:                -375.30
   No. Observations:                  85   AIC:                             764.6
   Df Residuals:                      78   BIC:                             781.7
   Df Model:                           6                                         
   Covariance Type:            nonrobust                                         
   ===============================================================================
                     coef    std err          t      P&gt;|t|      [95.0% Conf. Int.]
   -------------------------------------------------------------------------------
   Intercept      38.6517      9.456      4.087      0.000        19.826    57.478
   Region[T.E]   -15.4278      9.727     -1.586      0.117       -34.793     3.938
   Region[T.N]   -10.0170      9.260     -1.082      0.283       -28.453     8.419
   Region[T.S]    -4.5483      7.279     -0.625      0.534       -19.039     9.943
   Region[T.W]   -10.0913      7.196     -1.402      0.165       -24.418     4.235
   Literacy       -0.1858      0.210     -0.886      0.378        -0.603     0.232
   Wealth          0.4515      0.103      4.390      0.000         0.247     0.656
   ==============================================================================
   Omnibus:                        3.049   Durbin-Watson:                   1.785
   Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694
   Skew:                          -0.340   Prob(JB):                        0.260
   Kurtosis:                       2.454   Cond. No.                         371.
   ==============================================================================
   
   Warnings:
   [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Patsy's mode advanced features for categorical variables are discussed in: <a href="contrasts.html">Patsy: Contrast Coding Systems for categorical variables</a></p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Operators">Operators<a class="anchor-link" href="#Operators">&#182;</a></h2><p>We have already seen that "~" separates the left-hand side of the model from the right-hand side, and that "+" adds new columns to the design matrix.</p>
   <h3 id="Removing-variables">Removing variables<a class="anchor-link" href="#Removing-variables">&#182;</a></h3><p>The "-" sign can be used to remove columns/variables. For instance, we can remove the intercept from a model by:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region) -1 &#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>Intercept         38.651655
   C(Region)[T.E]   -15.427785
   C(Region)[T.N]   -10.016961
   C(Region)[T.S]    -4.548257
   C(Region)[T.W]   -10.091276
   Literacy          -0.185819
   Wealth             0.451475
   dtype: float64
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h3 id="Multiplicative-interactions">Multiplicative interactions<a class="anchor-link" href="#Multiplicative-interactions">&#182;</a></h3><p>":" adds a new column to the design matrix with the interaction of the other two columns. "*" will also include the individual columns that were multiplied together:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">res1</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy : Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">res2</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy * Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">&#39;</span><span class="p">)</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>C(Region)[C]    38.651655
   C(Region)[E]    23.223870
   C(Region)[N]    28.634694
   C(Region)[S]    34.103399
   C(Region)[W]    28.560379
   Literacy        -0.185819
   Wealth           0.451475
   dtype: float64
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Many other things are possible with operators. Please consult the <a href="https://patsy.readthedocs.org/en/latest/formulas.html">patsy docs</a> to learn more.</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Functions">Functions<a class="anchor-link" href="#Functions">&#182;</a></h2><p>You can apply vectorized functions to the variables in your model:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ np.log(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>Literacy:Wealth    0.018176
   dtype: float64 
   
   Literacy           0.427386
   Wealth             1.080987
   Literacy:Wealth   -0.013609
   dtype: float64
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Define a custom function:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="k">def</span> <span class="nf">log_plus_1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
       <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ log_plus_1(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>Intercept           115.609119
   np.log(Literacy)    -20.393959
   dtype: float64
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Any function that is in the calling namespace is available to the formula.</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="Using-formulas-with-models-that-do-not-(yet)-support-them">Using formulas with models that do not (yet) support them<a class="anchor-link" href="#Using-formulas-with-models-that-do-not-(yet)-support-them">&#182;</a></h2><p>Even if a given <code>statsmodels</code> function does not support formulas, you can still use <code>patsy</code>'s formula language to produce design matrices. Those matrices 
   can then be fed to the fitting function as <code>endog</code> and <code>exog</code> arguments.</p>
   <p>To generate <code>numpy</code> arrays:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">patsy</span>
   <span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>Intercept               136.003079
   log_plus_1(Literacy)    -20.393959
   dtype: float64
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>To generate pandas data frames:</p>
   
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="nb">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>   Lottery
   0       41
   1       38
   2       66
   3       80
   4       79
      Intercept  Literacy  Wealth  Literacy:Wealth
   0          1        37      73             2701
   1          1        51      22             1122
   2          1        13      61              793
   3          1        46      76             3496
   4          1        69      83             5727
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
   <div class="inner_cell">
       <div class="input_area">
   <div class=" highlight hl-ipython3"><pre><span class="nb">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>   Lottery
   0       41
   1       38
   2       66
   3       80
   4       79
      Intercept  Literacy  Wealth  Literacy:Wealth
   0          1        37      73             2701
   1          1        51      22             1122
   2          1        13      61              793
   3          1        46      76             3496
   4          1        69      83             5727
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>