.. module:: statsmodels.stats
:synopsis: Statistical methods and tests

.. currentmodule:: statsmodels.stats

.. _stats:

Statistics :mod:`stats`
=======================

This section collects various statistical tests and tools.
Some can be used independently of any models, some are intended as extension to the
models and model results.

API Warning: The functions and objects in this category are spread out in
various modules and might still be moved around. We expect that in future the
statistical tests will return class instances with more informative reporting
instead of only the raw numbers.

.. _stattools:

Residual Diagnostics and Specification Tests
--------------------------------------------

.. module:: statsmodels.stats.stattools
:synopsis: Statistical methods and tests that do not fit into other categories

.. currentmodule:: statsmodels.stats.stattools

.. autosummary::
:toctree: generated/

durbin_watson
jarque_bera
omni_normtest
medcouple
robust_skewness
robust_kurtosis
expected_robust_kurtosis

.. module:: statsmodels.stats.diagnostic
:synopsis: Statistical methods and tests to diagnose model fit problems

.. currentmodule:: statsmodels.stats.diagnostic

.. autosummary::
:toctree: generated/

acorr_ljungbox
acorr_breusch_godfrey

HetGoldfeldQuandt
het_goldfeldquandt
het_breuschpagan
het_white
het_arch

linear_harvey_collier
linear_rainbow
linear_lm

breaks_cusumolsresid
breaks_hansen
recursive_olsresiduals

CompareCox
compare_cox
CompareJ
compare_j

unitroot_adf

normal_ad
kstest_normal
lilliefors

Outliers and influence measures
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. module:: statsmodels.stats.outliers_influence
:synopsis: Statistical methods and measures for outliers and influence

.. currentmodule:: statsmodels.stats.outliers_influence

.. autosummary::
:toctree: generated/

OLSInfluence
GLMInfluence
MLEInfluence
variance_inflation_factor

See also the notes on :ref:`notes on regression diagnostics <diagnostics>`

Sandwich Robust Covariances
---------------------------

The following functions calculate covariance matrices and standard errors for
the parameter estimates that are robust to heteroscedasticity and
autocorrelation in the errors. Similar to the methods that are available
for the LinearModelResults, these methods are designed for use with OLS.

.. currentmodule:: statsmodels.stats

.. autosummary::
:toctree: generated/

sandwich_covariance.cov_hac
sandwich_covariance.cov_nw_panel
sandwich_covariance.cov_nw_groupsum
sandwich_covariance.cov_cluster
sandwich_covariance.cov_cluster_2groups
sandwich_covariance.cov_white_simple

The following are standalone versions of the heteroscedasticity robust
standard errors attached to LinearModelResults

.. autosummary::
:toctree: generated/

sandwich_covariance.cov_hc0
sandwich_covariance.cov_hc1
sandwich_covariance.cov_hc2
sandwich_covariance.cov_hc3

sandwich_covariance.se_cov

Goodness of Fit Tests and Measures
----------------------------------

some tests for goodness of fit for univariate distributions

.. module:: statsmodels.stats.gof
:synopsis: Goodness of fit measures and tests

.. currentmodule:: statsmodels.stats.gof

.. autosummary::
:toctree: generated/

powerdiscrepancy
gof_chisquare_discrete
gof_binning_discrete
chisquare_effectsize

.. currentmodule:: statsmodels.stats.diagnostic

.. autosummary::
:toctree: generated/

normal_ad
kstest_normal
lilliefors

Non-Parametric Tests
--------------------

.. module:: statsmodels.sandbox.stats.runs
:synopsis: Experimental statistical methods and tests to analyze runs

.. currentmodule:: statsmodels.sandbox.stats.runs

.. autosummary::
:toctree: generated/

mcnemar
symmetry_bowker
median_test_ksample
runstest_1samp
runstest_2samp
cochrans_q
Runs

.. module:: statsmodels.stats.descriptivestats
:synopsis: Descriptive statistics

.. currentmodule:: statsmodels.stats.descriptivestats

.. autosummary::
:toctree: generated/

sign_test

.. _interrater:

Interrater Reliability and Agreement
------------------------------------

The main function that statsmodels has currently available for interrater
agreement measures and tests is Cohen's Kappa. Fleiss' Kappa is currently
only implemented as a measures but without associated results statistics.

.. module:: statsmodels.stats.inter_rater
.. currentmodule:: statsmodels.stats.inter_rater

.. autosummary::
:toctree: generated/

cohens_kappa
fleiss_kappa
to_table
aggregate_raters

Multiple Tests and Multiple Comparison Procedures
-------------------------------------------------

`multipletests` is a function for p-value correction, which also includes p-value
correction based on fdr in `fdrcorrection`.
`tukeyhsd` performs simultaneous testing for the comparison of (independent) means.
These three functions are verified.
GroupsStats and MultiComparison are convenience classes to multiple comparisons similar
to one way ANOVA, but still in developement

.. module:: statsmodels.sandbox.stats.multicomp
:synopsis: Experimental methods for controlling size while performing multiple comparisons

.. currentmodule:: statsmodels.stats.multitest

.. autosummary::
:toctree: generated/

multipletests
fdrcorrection

.. currentmodule:: statsmodels.sandbox.stats.multicomp

.. autosummary::
:toctree: generated/

GroupsStats
MultiComparison
TukeyHSDResults

.. module:: statsmodels.stats.multicomp
:synopsis: Methods for controlling size while performing multiple comparisons

.. currentmodule:: statsmodels.stats.multicomp

.. autosummary::
:toctree: generated/

pairwise_tukeyhsd

.. module:: statsmodels.stats.multitest
:synopsis: Multiple testing p-value and FDR adjustments

.. currentmodule:: statsmodels.stats.multitest

.. autosummary::
:toctree: generated/

local_fdr
fdrcorrection_twostage
NullDistribution
RegressionFDR

.. module:: statsmodels.stats.knockoff_regeffects
:synopsis: Regression Knock-Off Effects

.. currentmodule:: statsmodels.stats.knockoff_regeffects

.. autosummary::
:toctree: generated/

CorrelationEffects
OLSEffects
ForwardEffects
OLSEffects
RegModelEffects

The following functions are not (yet) public

.. currentmodule:: statsmodels.sandbox.stats.multicomp

.. autosummary::
:toctree: generated/

varcorrection_pairs_unbalanced
varcorrection_pairs_unequal
varcorrection_unbalanced
varcorrection_unequal

StepDown
catstack
ccols
compare_ordered
distance_st_range
ecdf
get_tukeyQcrit
homogeneous_subsets
maxzero
maxzerodown
mcfdr
qcrit
randmvn
rankdata
rejectionline
set_partition
set_remove_subs
tiecorrect

.. _tost:

Basic Statistics and t-Tests with frequency weights
---------------------------------------------------

Besides basic statistics, like mean, variance, covariance and correlation for
data with case weights, the classes here provide one and two sample tests
for means. The t-tests have more options than those in scipy.stats, but are
more restrictive in the shape of the arrays. Confidence intervals for means
are provided based on the same assumptions as the t-tests.

Additionally, tests for equivalence of means are available for one sample and
for two, either paired or independent, samples. These tests are based on TOST,
two one-sided tests, which have as null hypothesis that the means are not
"close" to each other.

.. module:: statsmodels.stats.weightstats
:synopsis: Weighted statistics

.. currentmodule:: statsmodels.stats.weightstats

.. autosummary::
:toctree: generated/

DescrStatsW
CompareMeans
ttest_ind
ttost_ind
ttost_paired
ztest
ztost
zconfint

weightstats also contains tests and confidence intervals based on summary
data

.. currentmodule:: statsmodels.stats.weightstats

.. autosummary::
:toctree: generated/

_tconfint_generic
_tstat_generic
_zconfint_generic
_zstat_generic
_zstat_generic2

Power and Sample Size Calculations
----------------------------------

The :mod:`power` module currently implements power and sample size calculations
for the t-tests, normal based test, F-tests and Chisquare goodness of fit test.
The implementation is class based, but the module also provides
three shortcut functions, ``tt_solve_power``, ``tt_ind_solve_power`` and
``zt_ind_solve_power`` to solve for any one of the parameters of the power
equations.

.. module:: statsmodels.stats.power
:synopsis: Power and size calculations for common tests

.. currentmodule:: statsmodels.stats.power

.. autosummary::
:toctree: generated/

TTestIndPower
TTestPower
GofChisquarePower
NormalIndPower
FTestAnovaPower
FTestPower
tt_solve_power
tt_ind_solve_power
zt_ind_solve_power

.. _proportion_stats:

Proportion
----------

Also available are hypothesis test, confidence intervals and effect size for
proportions that can be used with NormalIndPower.

.. module:: statsmodels.stats.proportion
:synopsis: Tests for proportions

.. currentmodule:: statsmodels.stats.proportion

.. autosummary::
:toctree: generated

proportion_confint
proportion_effectsize

binom_test
binom_test_reject_interval
binom_tost
binom_tost_reject_interval

multinomial_proportions_confint

proportions_ztest
proportions_ztost
proportions_chisquare
proportions_chisquare_allpairs
proportions_chisquare_pairscontrol

proportion_effectsize
power_binom_tost
power_ztost_prop
samplesize_confint_proportion

Moment Helpers
--------------

When there are missing values, then it is possible that a correlation or
covariance matrix is not positive semi-definite. The following three
functions can be used to find a correlation or covariance matrix that is
positive definite and close to the original matrix.

.. module:: statsmodels.stats.correlation_tools
:synopsis: Procedures for ensuring correlations are positive semi-definite

.. currentmodule:: statsmodels.stats.correlation_tools

.. autosummary::
:toctree: generated/

corr_clipped
corr_nearest
corr_nearest_factor
corr_thresholded
cov_nearest
cov_nearest_factor_homog
FactoredPSDMatrix
kernel_covariance

These are utility functions to convert between central and non-central moments, skew,
kurtosis and cummulants.

.. module:: statsmodels.stats.moment_helpers
:synopsis: Tools for converting moments

.. currentmodule:: statsmodels.stats.moment_helpers

.. autosummary::
:toctree: generated/

cum2mc
mc2mnc
mc2mvsk
mnc2cum
mnc2mc
mnc2mvsk
mvsk2mc
mvsk2mnc
cov2corr
corr2cov
se_cov

Mediation Analysis
------------------

Mediation analysis focuses on the relationships among three key variables:
an 'outcome', a 'treatment', and a 'mediator'. Since mediation analysis is a
form of causal inference, there are several assumptions involved that are
difficult or impossible to verify. Ideally, mediation analysis is conducted in
the context of an experiment such as this one in which the treatment is
randomly assigned. It is also common for people to conduct mediation analyses
using observational data in which the treatment may be thought of as an
'exposure'. The assumptions behind mediation analysis are even more difficult
to verify in an observational setting.

.. module:: statsmodels.stats.mediation
:synopsis: Mediation analysis

.. currentmodule:: statsmodels.stats.mediation

.. autosummary::
:toctree: generated/

Mediation
MediationResults