Discrete Choice Models Overview

Link to Notebook GitHub

In [ ]:
from __future__ import print_function
import numpy as np
import statsmodels.api as sm


Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

In [ ]:
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

Inspect the data:

In [ ]:

Linear Probability Model (OLS)

In [ ]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print('Parameters: ', lpm_res.params[:-1])
[[  2.66  20.     0.     1.  ]
 [  2.89  22.     0.     1.  ]
 [  3.28  24.     0.     1.  ]
 [  2.92  12.     0.     1.  ]
 [  4.    21.     0.     1.  ]]
[ 0.  0.  0.  0.  1.]

Logit Model

In [ ]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print('Parameters: ', logit_res.params)
Parameters:  [ 0.46385168  0.01049512  0.37855479]

Marginal Effects

In [ ]:
margeff = logit_res.get_margeff()
Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]

As in all the discrete data models presented below, we can print a nice summary of results:

In [ ]:
        Logit Marginal Effects
Dep. Variable:                      y
Method:                          dydx
At:                           overall
                dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
x1             0.3626      0.109      3.313      0.001         0.148     0.577
x2             0.0122      0.018      0.686      0.493        -0.023     0.047
x3             0.3052      0.092      3.304      0.001         0.124     0.486

Probit Model

In [ ]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print('Parameters: ', probit_res.params)
print('Marginal effects: ')
                           Logit Regression Results
Dep. Variable:                      y   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                  0.3740
Time:                        17:43:21   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
                                        LLR p-value:                  0.001502
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
x1             2.8261      1.263      2.238      0.025         0.351     5.301
x2             0.0952      0.142      0.672      0.501        -0.182     0.373
x3             2.3787      1.065      2.234      0.025         0.292     4.465
const        -13.0213      4.931     -2.641      0.008       -22.687    -3.356

Multinomial Logit

Load data from the American National Election Studies:

In [ ]:
anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog = sm.add_constant(anes_exog, prepend=False)
Optimization terminated successfully.
         Current function value: 0.400588
         Iterations 6
Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
Marginal effects:
       Probit Marginal Effects
Dep. Variable:                      y
Method:                          dydx
At:                           overall
                dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
x1             0.3608      0.113      3.182      0.001         0.139     0.583
x2             0.0115      0.018      0.624      0.533        -0.025     0.048
x3             0.3165      0.090      3.508      0.000         0.140     0.493

Inspect the data:

In [ ]:

Fit MNL model:

In [ ]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
[[ -2.30258509   7.          36.           3.           1.        ]
 [  5.24755025   3.          20.           4.           1.        ]
 [  3.43720782   2.          24.           6.           1.        ]
 [  4.4200447    3.          28.           6.           1.        ]
 [  6.46162441   5.          68.           6.           1.        ]]
[ 6.  1.  1.  1.  0.]


Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

In [ ]:
rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
rand_exog = sm.add_constant(rand_exog, prepend=False)
Optimization terminated successfully.
         Current function value: 1.548647
         Iterations 7
[[ -1.15359746e-02  -8.87506530e-02  -1.05966699e-01  -9.15567017e-02
   -9.32846040e-02  -1.40880692e-01]
 [  2.97714352e-01   3.91668642e-01   5.73450508e-01   1.27877179e+00
    1.34696165e+00   2.07008014e+00]
 [ -2.49449954e-02  -2.28978371e-02  -1.48512069e-02  -8.68134503e-03
   -1.79040689e-02  -9.43264870e-03]
 [  8.24914421e-02   1.81042758e-01  -7.15241904e-03   1.99827955e-01
    2.16938850e-01   3.21925702e-01]
 [  5.19655317e-03   4.78739761e-02   5.75751595e-02   8.44983753e-02
    8.09584122e-02   1.08894083e-01]
 [ -3.73401677e-01  -2.25091318e+00  -3.66558353e+00  -7.61384309e+00
   -7.06047825e+00  -1.21057509e+01]]

Fit Poisson model:

In [ ]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")

Negative Binomial

The negative binomial model gives slightly different results.

In [ ]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
Optimization terminated successfully.
         Current function value: 3.091609
         Iterations 12
                          Poisson Regression Results
Dep. Variable:                      y   No. Observations:                20190
Model:                        Poisson   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.06343
Time:                        17:43:22   Log-Likelihood:                -62420.
converged:                       True   LL-Null:                       -66647.
                                        LLR p-value:                     0.000
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
x1            -0.0525      0.003    -18.216      0.000        -0.058    -0.047
x2            -0.2471      0.011    -23.272      0.000        -0.268    -0.226
x3             0.0353      0.002     19.302      0.000         0.032     0.039
x4            -0.0346      0.002    -21.439      0.000        -0.038    -0.031
x5             0.2717      0.012     22.200      0.000         0.248     0.296
x6             0.0339      0.001     60.098      0.000         0.033     0.035
x7            -0.0126      0.009     -1.366      0.172        -0.031     0.005
x8             0.0541      0.015      3.531      0.000         0.024     0.084
x9             0.2061      0.026      7.843      0.000         0.155     0.258
const          0.7004      0.011     62.741      0.000         0.678     0.722

Alternative solvers

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

In [ ]:
mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
                     NegativeBinomial Regression Results
Dep. Variable:                      y   No. Observations:                20190
Model:               NegativeBinomial   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.01845
Time:                        17:43:23   Log-Likelihood:                -43384.
converged:                      False   LL-Null:                       -44199.
                                        LLR p-value:                     0.000
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
x1            -0.0580      0.006     -9.517      0.000        -0.070    -0.046
x2            -0.2678      0.023    -11.802      0.000        -0.312    -0.223
x3             0.0412      0.004      9.937      0.000         0.033     0.049
x4            -0.0381      0.003    -11.219      0.000        -0.045    -0.031
x5             0.2690      0.030      8.981      0.000         0.210     0.328
x6             0.0382      0.001     26.081      0.000         0.035     0.041
x7            -0.0441      0.020     -2.200      0.028        -0.083    -0.005
x8             0.0172      0.036      0.477      0.633        -0.054     0.088
x9             0.1780      0.074      2.397      0.017         0.032     0.324
const          0.6636      0.025     26.787      0.000         0.615     0.712
alpha          1.2930      0.019     69.477      0.000         1.256     1.329
/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)