The Datasets Package¶
statsmodels
provides data sets (i.e. data and meta-data) for use in
examples, tutorials, model testing, etc.
Using Datasets from Stata¶
|
Download and return an example dataset from Stata. |
Using Datasets from R¶
The Rdatasets project gives access to the datasets available in R’s core datasets package and many other common R packages. All of these datasets are available to statsmodels by using the get_rdataset
function. The actual data is accessible by the data
attribute. For example:
In [1]: import statsmodels.api as sm
In [2]: duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData")
In [3]: print(duncan_prestige.__doc__)
+--------+-----------------+
| Duncan | R Documentation |
+--------+-----------------+
Duncan's Occupational Prestige Data
-----------------------------------
Description
~~~~~~~~~~~
The ``Duncan`` data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.
Usage
~~~~~
::
Duncan
Format
~~~~~~
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels: ``prof``,
professional and managerial; ``wc``, white-collar; ``bc``,
blue-collar.
income
Percentage of occupational incumbents in the 1950 US Census who
earned $3,500 or more per year (about $36,000 in 2017 US dollars).
education
Percentage of occupational incumbents in 1950 who were high school
graduates (which, were we cynical, we would say is roughly equivalent
to a PhD in 2017)
prestige
Percentage of respondents in a social survey who rated the occupation
as “good” or better in prestige
Source
~~~~~~
Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].
References
~~~~~~~~~~
Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.
In [4]: duncan_prestige.data.head(5)