statsmodels.imputation.bayes_mi.BayesGaussMI

class statsmodels.imputation.bayes_mi.BayesGaussMI(data, mean_prior=None, cov_prior=None, cov_prior_df=1)[source]

Bayesian Imputation using a Gaussian model.

The approach is Bayesian. The goal is to sample from the joint distribution of the mean vector, covariance matrix, and missing data values given the observed data values. Conjugate priors for the population mean and covariance matrix are used. Gibbs sampling is used to update the mean vector, covariance matrix, and missing data values in turn. After burn-in, the imputed complete data sets from the Gibbs chain can be used in multiple imputation analyses (MI).

Parameters
datandarray

The array of data to be imputed. Values in the array equal to NaN are imputed.

mean_priorndarray, optional

The covariance matrix of the Gaussian prior distribution for the mean vector. If not provided, the identity matrix is used.

cov_priorndarray, optional

The center matrix for the inverse Wishart prior distribution for the covariance matrix. If not provided, the identity matrix is used.

cov_prior_dfpositive float

The degrees of freedom of the inverse Wishart prior distribution for the covariance matrix. Defaults to 1.

Examples

A basic example with OLS. Data is generated assuming 10% is missing at random.

>>> import numpy as np
>>> x = np.random.standard_normal((1000, 2))
>>> x.flat[np.random.sample(2000) < 0.1] = np.nan

The imputer is used with MI.

>>> import statsmodels.api as sm
>>> def model_args_fn(x):
...     # Return endog, exog from x
...    return x[:, 0], x[:, 1:]
>>> imp = sm.BayesGaussMI(x)
>>> mi = sm.MI(imp, sm.OLS, model_args_fn)

Methods

update()

Cycle through all Gibbs updates.

update_cov()

Gibbs update of the covariance matrix.

update_data()

Gibbs update of the missing data values.

update_mean()

Gibbs update of the mean vector.