.. currentmodule:: statsmodels.regression.mixed_linear_model

.. _mixedlmmod:

Linear Mixed Effects Models
===========================

Linear Mixed Effects models are used for regression analyses involving
dependent data. Such data arise when working with longitudinal and
other study designs in which multiple observations are made on each
subject. Two specific mixed effects models are "random intercepts
models", where all responses in a single group are additively shifted
by a value that is specific to the group, and "random slopes models",
where the values follow a mean trajectory that is linear in observed
covariates, with both the slopes and intercept being specific to the
group. The Statsmodels MixedLM implementation allows arbitrary random
effects design matrices to be specified for the groups, so these and
other types of random effects models can all be fit.

The Statsmodels LME framework currently supports post-estimation
inference via Wald tests and confidence intervals on the coefficients,
profile likelihood analysis, likelihood ratio testing, and AIC. Some
limitations of the current implementation are that it does not support
structure more complex on the residual errors (they are always
homoscedastic), and it does not support crossed random effects. We
hope to implement these features for the next release.

Examples
--------

..code-block:: python

import statsmodels.api as sm
import statsmodels.formula.api as smf

data = sm.datasets.get_rdataset("dietox", "geepack").data

md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
mdf = md.fit()
print(mdf.summary())

Detailed examples can be found here

.. toctree::
:maxdepth: 2

examples/notebooks/generated/

There some notebook examples on the Wiki:
`Wiki notebooks for MixedLM <https://github.com/statsmodels/statsmodels/wiki/Examples#linear-mixed-models>`_

Technical Documentation
-----------------------

The data are partitioned into disjoint groups. The probability model
for group i is:

Y = X*beta + Z*gamma + epsilon

where

* n_i is the number of observations in group i
* Y is a n_i dimensional response vector
* X is a n_i x k_fe dimensional matrix of fixed effects
coefficients
* beta is a k_fe-dimensional vector of fixed effects slopes
* Z is a n_i x k_re dimensional matrix of random effects
coefficients
* gamma is a k_re-dimensional random vector with mean 0
and covariance matrix Psi; note that each group
gets its own independent realization of gamma.
* epsilon is a n_i dimensional vector of iid normal
errors with mean 0 and variance sigma^2; the epsilon
values are independent both within and between groups

Y, X and Z must be entirely observed. beta, Psi, and sigma^2 are
estimated using ML or REML estimation, and gamma and epsilon are
random so define the probability model.

The mean structure is E[Y|X,Z] = X*beta. If only the mean structure
is of interest, GEE is a good alternative to mixed models.

The primary reference for the implementation details is:

MJ Lindstrom, DM Bates (1988). "Newton Raphson and EM algorithms for
linear mixed effects models for repeated measures data". Journal of
the American Statistical Association. Volume 83, Issue 404, pages
1014-1022.