Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

Examples

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

In [4]: fam = sm.families.Poisson()

In [5]: ind = sm.cov_struct.Exchangeable()

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)
   ...: 

In [7]: res = mod.fit()

In [8]: print(res.summary())
                               GEE Regression Results                              
===================================================================================
Dep. Variable:                           y   No. Observations:                  236
Model:                                 GEE   No. clusters:                       59
Method:                        Generalized   Min. cluster size:                   4
                      Estimating Equations   Max. cluster size:                   4
Family:                            Poisson   Mean cluster size:                 4.0
Dependence structure:         Exchangeable   Num. iterations:                     2
Date:                     Tue, 17 Dec 2019   Scale:                           1.000
Covariance type:                    robust   Time:                         23:45:17
====================================================================================
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept            0.5730      0.361      1.589      0.112      -0.134       1.280
trt[T.progabide]    -0.1519      0.171     -0.888      0.375      -0.487       0.183
age                  0.0223      0.011      1.960      0.050    2.11e-06       0.045
base                 0.0226      0.001     18.451      0.000       0.020       0.025
==============================================================================
Skew:                          3.7823   Kurtosis:                      28.6672
Centered skew:                 2.7597   Centered kurtosis:             21.9865
==============================================================================

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

References

  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.

  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130

  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.

  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf

  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …])

Marginal Regression Model using Generalized Estimating Equations.

NominalGEE(endog, exog, groups[, time, …])

Nominal Response Marginal Regression Model using GEE.

OrdinalGEE(endog, exog, groups[, time, …])

Ordinal Response Marginal Regression Model using GEE

QIF(endog, exog, groups[, family, …])

Fit a regression model using quadratic inference functions (QIF).

Results Classes

GEEResults(model, params, cov_params, scale)

This class summarizes the fit of a marginal regression model using GEE.

GEEMargins(results, args[, kwargs])

Estimated marginal effects for a regression model fit with GEE.

QIFResults(model, params, cov_params, scale)

Results class for QIF Regression

Dependence Structures

The dependence structures currently implemented are

CovStruct([cov_nearest_method])

Base class for correlation and covariance structures.

Autoregressive([dist_func])

A first-order autoregressive working dependence structure.

Exchangeable()

An exchangeable working dependence structure.

GlobalOddsRatio(endog_type)

Estimate the global odds ratio for a GEE with ordinal or nominal data.

Independence([cov_nearest_method])

An independence working dependence structure.

Nested([cov_nearest_method])

A nested working dependence structure.

Families

The distribution families are the same as for GLM, currently implemented are

Family(link, variance)

The parent class for one-parameter exponential families.

Binomial([link])

Binomial exponential family distribution.

Gamma([link])

Gamma exponential family distribution.

Gaussian([link])

Gaussian exponential family distribution.

InverseGaussian([link])

InverseGaussian exponential family.

NegativeBinomial([link, alpha])

Negative Binomial exponential family.

Poisson([link])

Poisson exponential family.

Tweedie([link, var_power, eql])

Tweedie family.

The link functions are the same as for GLM, currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by

>>> sm.families.family.<familyname>.links

Link

A generic link function for one-parameter exponential family.

CDFLink([dbn])

The use the CDF of a scipy.stats distribution

CLogLog

The complementary log-log transform

Log

The log transform

Logit

The logit transform

NegativeBinomial([alpha])

The negative binomial link function

Power([power])

The power transform

cauchy()

The Cauchy (standard Cauchy CDF) transform

cloglog

The CLogLog transform link function.

identity()

The identity transform

inverse_power()

The inverse transform

inverse_squared()

The inverse squared transform

log

The log transform

logit

Methods

nbinom([alpha])

The negative binomial link function.

probit([dbn])

The probit (standard normal CDF) transform