.. module:: statsmodels.base.optimizer
.. currentmodule:: statsmodels.base.optimizer

Optimization
============

statsmodels uses three types of algorithms for the estimation of the parameters
of a model.

  1. Basic linear models such as :ref:`WLS and OLS <regression>` are directly
     estimated using appropriate linear algebra.
  2. :ref:`RLM <rlm>` and :ref:`GLM <glm>`, use iteratively re-weighted
     least squares. However, you can optionally select one of the scipy
     optimizers discussed below.
  3. For all other models, we use
     `optimizers <https://docs.scipy.org/doc/scipy/reference/optimize.html>`_
     from `scipy <https://docs.scipy.org/doc/scipy/reference/index.html>`_.

Where practical, certain models allow for the optional selection of a
scipy optimizer. A particular scipy optimizer might be default or an option.
Depending on the model and the data, choosing an appropriate scipy optimizer
enables avoidance of a local minima, fitting models in less time, or fitting a
model with less memory.

statsmodels supports the following optimizers along with keyword arguments
associated with that specific optimizer:

- ``newton`` - Newton-Raphson iteration. While not directly from scipy, we
  consider it an optimizer because only the score and hessian are required.

    tol : float
        Relative error in params acceptable for convergence.

- ``nm`` - scipy's ``fmin_nm``

    xtol : float
        Relative error in params acceptable for convergence
    ftol : float
        Relative error in loglike(params) acceptable for
        convergence
    maxfun : int
        Maximum number of function evaluations to make.

- ``bfgs`` - Broyden–Fletcher–Goldfarb–Shanno optimization, scipy's
  ``fmin_bfgs``.

      gtol : float
          Stop when norm of gradient is less than gtol.
      norm : float
          Order of norm (np.Inf is max, -np.Inf is min)
      epsilon
          If fprime is approximated, use this value for the step
          size. Only relevant if LikelihoodModel.score is None.

- ``lbfgs`` - A more memory-efficient (limited memory) implementation of
  ``bfgs``. Scipy's ``fmin_l_bfgs_b``.

      m : int
          The maximum number of variable metric corrections used to
          define the limited memory matrix. (The limited memory BFGS
          method does not store the full hessian but uses this many
          terms in an approximation to it.)
      pgtol : float
          The iteration will stop when
          ``max{|proj g_i | i = 1, ..., n} <= pgtol`` where pg_i is
          the i-th component of the projected gradient.
      factr : float
          The iteration stops when
          ``(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps``,
          where eps is the machine precision, which is automatically
          generated by the code. Typical values for factr are: 1e12
          for low accuracy; 1e7 for moderate accuracy; 10.0 for
          extremely high accuracy. See Notes for relationship to
          ftol, which is exposed (instead of factr) by the
          scipy.optimize.minimize interface to L-BFGS-B.
      maxfun : int
          Maximum number of iterations.
      epsilon : float
          Step size used when approx_grad is True, for numerically
          calculating the gradient
      approx_grad : bool
          Whether to approximate the gradient numerically (in which
          case func returns only the function value).

- ``cg`` - Conjugate gradient optimization. Scipy's ``fmin_cg``.

      gtol : float
          Stop when norm of gradient is less than gtol.
      norm : float
          Order of norm (np.Inf is max, -np.Inf is min)
      epsilon : float
          If fprime is approximated, use this value for the step
          size. Can be scalar or vector.  Only relevant if
          Likelihoodmodel.score is None.

- ``ncg`` - Newton conjugate gradient. Scipy's ``fmin_ncg``.

      fhess_p : callable f'(x, \*args)
          Function which computes the Hessian of f times an arbitrary
          vector, p.  Should only be supplied if
          LikelihoodModel.hessian is None.
      avextol : float
          Stop when the average relative error in the minimizer
          falls below this amount.
      epsilon : float or ndarray
          If fhess is approximated, use this value for the step size.
          Only relevant if Likelihoodmodel.hessian is None.

- ``powell`` - Powell's method. Scipy's ``fmin_powell``.

      xtol : float
          Line-search error tolerance
      ftol : float
          Relative error in loglike(params) for acceptable for
          convergence.
      maxfun : int
          Maximum number of function evaluations to make.
      start_direc : ndarray
          Initial direction set.

- ``basinhopping`` - Basin hopping. This is part of scipy's ``basinhopping``
  tools.

      niter : integer
          The number of basin hopping iterations.
      niter_success : integer
          Stop the run if the global minimum candidate remains the
          same for this number of iterations.
      T : float
          The "temperature" parameter for the accept or reject
          criterion. Higher "temperatures" mean that larger jumps
          in function value will be accepted. For best results
          `T` should be comparable to the separation (in function
          value) between local minima.
      stepsize : float
          Initial step size for use in the random displacement.
      interval : integer
          The interval for how often to update the `stepsize`.
      minimizer : dict
          Extra keyword arguments to be passed to the minimizer
          `scipy.optimize.minimize()`, for example 'method' - the
          minimization method (e.g. 'L-BFGS-B'), or 'tol' - the
          tolerance for termination. Other arguments are mapped from
          explicit argument of `fit`:
          - `args` <- `fargs`
          - `jac` <- `score`
          - `hess` <- `hess`

- ``minimize`` - Allows the use of any scipy optimizer.

  min_method : str, optional
      Name of minimization method to use.
      Any method specific arguments can be passed directly.
      For a list of methods and their arguments, see
      documentation of `scipy.optimize.minimize`.
      If no method is specified, then BFGS is used.

Model Class
-----------

Generally, there is no need for an end-user to directly call these functions
and classes. However, we provide the class because the different optimization
techniques have unique keyword arguments that may be useful to the user.

.. autosummary::
   :toctree: generated/

   Optimizer
   _fit_newton
   _fit_bfgs
   _fit_lbfgs
   _fit_nm
   _fit_cg
   _fit_ncg
   _fit_powell
   _fit_basinhopping