Skip to content

Commit 49f93ed

Browse files
jorisvandenbosschephoflMarcoGorelli
authored
DEV: remove downstream test packages from environment.yml (pandas-dev#50157)
* DEV: remove downstream test packages from environment.yml * undo python change, add seaborn-base * typo * use plain code block for statsmodels whatsnew note * use code-block in user guide * fixup 0.16.2 whatsnew * also remove pandas-gbq Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: MarcoGorelli <> Co-authored-by: Marco Edward Gorelli <[email protected]>
1 parent 63e50d5 commit 49f93ed

File tree

6 files changed

+109
-52
lines changed

6 files changed

+109
-52
lines changed

ci/deps/actions-38-downstream_compat.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ dependencies:
3939
- numexpr
4040
- openpyxl
4141
- odfpy
42-
- pandas-gbq
4342
- psycopg2
4443
- pyarrow<10
4544
- pymysql
@@ -68,5 +67,6 @@ dependencies:
6867
- statsmodels
6968
- coverage
7069
- pandas-datareader
70+
- pandas-gbq
7171
- pyyaml
7272
- py

doc/source/user_guide/basics.rst

+47-13
Original file line numberDiff line numberDiff line change
@@ -827,20 +827,54 @@ In this case, provide ``pipe`` with a tuple of ``(callable, data_keyword)``.
827827

828828
For example, we can fit a regression using statsmodels. Their API expects a formula first and a ``DataFrame`` as the second argument, ``data``. We pass in the function, keyword pair ``(sm.ols, 'data')`` to ``pipe``:
829829

830-
.. ipython:: python
831-
:okwarning:
832-
833-
import statsmodels.formula.api as sm
834-
835-
bb = pd.read_csv("data/baseball.csv", index_col="id")
830+
.. code-block:: ipython
836831
837-
(
838-
bb.query("h > 0")
839-
.assign(ln_h=lambda df: np.log(df.h))
840-
.pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
841-
.fit()
842-
.summary()
843-
)
832+
In [147]: import statsmodels.formula.api as sm
833+
834+
In [148]: bb = pd.read_csv("data/baseball.csv", index_col="id")
835+
836+
In [149]: (
837+
.....: bb.query("h > 0")
838+
.....: .assign(ln_h=lambda df: np.log(df.h))
839+
.....: .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
840+
.....: .fit()
841+
.....: .summary()
842+
.....: )
843+
.....:
844+
Out[149]:
845+
<class 'statsmodels.iolib.summary.Summary'>
846+
"""
847+
OLS Regression Results
848+
==============================================================================
849+
Dep. Variable: hr R-squared: 0.685
850+
Model: OLS Adj. R-squared: 0.665
851+
Method: Least Squares F-statistic: 34.28
852+
Date: Tue, 22 Nov 2022 Prob (F-statistic): 3.48e-15
853+
Time: 05:34:17 Log-Likelihood: -205.92
854+
No. Observations: 68 AIC: 421.8
855+
Df Residuals: 63 BIC: 432.9
856+
Df Model: 4
857+
Covariance Type: nonrobust
858+
===============================================================================
859+
coef std err t P>|t| [0.025 0.975]
860+
-------------------------------------------------------------------------------
861+
Intercept -8484.7720 4664.146 -1.819 0.074 -1.78e+04 835.780
862+
C(lg)[T.NL] -2.2736 1.325 -1.716 0.091 -4.922 0.375
863+
ln_h -1.3542 0.875 -1.547 0.127 -3.103 0.395
864+
year 4.2277 2.324 1.819 0.074 -0.417 8.872
865+
g 0.1841 0.029 6.258 0.000 0.125 0.243
866+
==============================================================================
867+
Omnibus: 10.875 Durbin-Watson: 1.999
868+
Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.298
869+
Skew: 0.537 Prob(JB): 0.000175
870+
Kurtosis: 5.225 Cond. No. 1.49e+07
871+
==============================================================================
872+
873+
Notes:
874+
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
875+
[2] The condition number is large, 1.49e+07. This might indicate that there are
876+
strong multicollinearity or other numerical problems.
877+
"""
844878
845879
The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
846880
have introduced the popular ``(%>%)`` (read pipe) operator for R_.

doc/source/whatsnew/v0.16.2.rst

+49-15
Original file line numberDiff line numberDiff line change
@@ -61,21 +61,55 @@ In the example above, the functions ``f``, ``g``, and ``h`` each expected the Da
6161
When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple
6262
of ``(function, keyword)`` indicating where the DataFrame should flow. For example:
6363

64-
.. ipython:: python
65-
:okwarning:
66-
67-
import statsmodels.formula.api as sm
68-
69-
bb = pd.read_csv("data/baseball.csv", index_col="id")
70-
71-
# sm.ols takes (formula, data)
72-
(
73-
bb.query("h > 0")
74-
.assign(ln_h=lambda df: np.log(df.h))
75-
.pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
76-
.fit()
77-
.summary()
78-
)
64+
.. code-block:: ipython
65+
66+
In [1]: import statsmodels.formula.api as sm
67+
68+
In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")
69+
70+
# sm.ols takes (formula, data)
71+
In [3]: (
72+
...: bb.query("h > 0")
73+
...: .assign(ln_h=lambda df: np.log(df.h))
74+
...: .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
75+
...: .fit()
76+
...: .summary()
77+
...: )
78+
...:
79+
Out[3]:
80+
<class 'statsmodels.iolib.summary.Summary'>
81+
"""
82+
OLS Regression Results
83+
==============================================================================
84+
Dep. Variable: hr R-squared: 0.685
85+
Model: OLS Adj. R-squared: 0.665
86+
Method: Least Squares F-statistic: 34.28
87+
Date: Tue, 22 Nov 2022 Prob (F-statistic): 3.48e-15
88+
Time: 05:35:23 Log-Likelihood: -205.92
89+
No. Observations: 68 AIC: 421.8
90+
Df Residuals: 63 BIC: 432.9
91+
Df Model: 4
92+
Covariance Type: nonrobust
93+
===============================================================================
94+
coef std err t P>|t| [0.025 0.975]
95+
-------------------------------------------------------------------------------
96+
Intercept -8484.7720 4664.146 -1.819 0.074 -1.78e+04 835.780
97+
C(lg)[T.NL] -2.2736 1.325 -1.716 0.091 -4.922 0.375
98+
ln_h -1.3542 0.875 -1.547 0.127 -3.103 0.395
99+
year 4.2277 2.324 1.819 0.074 -0.417 8.872
100+
g 0.1841 0.029 6.258 0.000 0.125 0.243
101+
==============================================================================
102+
Omnibus: 10.875 Durbin-Watson: 1.999
103+
Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.298
104+
Skew: 0.537 Prob(JB): 0.000175
105+
Kurtosis: 5.225 Cond. No. 1.49e+07
106+
==============================================================================
107+
108+
Notes:
109+
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
110+
[2] The condition number is large, 1.49e+07. This might indicate that there are
111+
strong multicollinearity or other numerical problems.
112+
"""
79113
80114
The pipe method is inspired by unix pipes, which stream text through
81115
processes. More recently dplyr_ and magrittr_ have introduced the

environment.yml

+6-12
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- psutil
1818
- pytest-asyncio>=0.17
1919
- boto3
20+
- coverage
2021

2122
# required dependencies
2223
- python-dateutil
@@ -27,20 +28,22 @@ dependencies:
2728
- beautifulsoup4
2829
- blosc
2930
- brotlipy
31+
- botocore
3032
- bottleneck
3133
- fastparquet
3234
- fsspec
3335
- html5lib
3436
- hypothesis
3537
- gcsfs
38+
- ipython
3639
- jinja2
3740
- lxml
3841
- matplotlib>=3.6.1
3942
- numba>=0.53.1
4043
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
4144
- openpyxl
4245
- odfpy
43-
- pandas-gbq
46+
- py
4447
- psycopg2
4548
- pyarrow<10
4649
- pymysql
@@ -60,17 +63,8 @@ dependencies:
6063

6164
# downstream packages
6265
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
63-
- botocore
64-
- cftime
65-
- dask
66-
- ipython
67-
- seaborn
68-
- scikit-learn
69-
- statsmodels
70-
- coverage
71-
- pandas-datareader
72-
- pyyaml
73-
- py
66+
- dask-core
67+
- seaborn-base
7468

7569
# local testing dependencies
7670
- moto

requirements-dev.txt

+4-10
Original file line numberDiff line numberDiff line change
@@ -10,26 +10,29 @@ pytest-xdist>=1.31
1010
psutil
1111
pytest-asyncio>=0.17
1212
boto3
13+
coverage
1314
python-dateutil
1415
numpy
1516
pytz
1617
beautifulsoup4
1718
blosc
1819
brotlipy
20+
botocore
1921
bottleneck
2022
fastparquet
2123
fsspec
2224
html5lib
2325
hypothesis
2426
gcsfs
27+
ipython
2528
jinja2
2629
lxml
2730
matplotlib>=3.6.1
2831
numba>=0.53.1
2932
numexpr>=2.8.0
3033
openpyxl
3134
odfpy
32-
pandas-gbq
35+
py
3336
psycopg2-binary
3437
pyarrow<10
3538
pymysql
@@ -47,17 +50,8 @@ xlrd
4750
xlsxwriter
4851
zstandard
4952
aiobotocore<2.0.0
50-
botocore
51-
cftime
5253
dask
53-
ipython
5454
seaborn
55-
scikit-learn
56-
statsmodels
57-
coverage
58-
pandas-datareader
59-
pyyaml
60-
py
6155
moto
6256
flask
6357
asv

scripts/generate_pip_deps_from_conda.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,9 @@
2424
REMAP_VERSION = {"tzdata": "2022.1"}
2525
RENAME = {
2626
"pytables": "tables",
27-
"geopandas-base": "geopandas",
2827
"psycopg2": "psycopg2-binary",
28+
"dask-core": "dask",
29+
"seaborn-base": "seaborn",
2930
}
3031

3132

0 commit comments

Comments
 (0)