-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby apply failed on dataframe with DatetimeIndex #26182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you provide a code sample that is self-contained, i.e. doesn't require the external file to reproduce the issue? |
I picked up several rows from the csv file, here is the self-contained version: # -*- coding: utf-8 -*-
import pandas as pd
from io import StringIO
def _do_calc_(_df):
df2 = _df.copy()
df2["ma20"] = df2["close"].rolling(20, min_periods=1).mean()
return df2
data = """trade_day,exchange,code,close
2019-01-02,CFFEX,ic,4074
2019-01-02,DCE,a,3408
2019-01-02,DCE,b,2970
2019-01-02,SHFE,ag,3711
2019-01-02,SHFE,al,13446
2019-01-02,SHFE,au,288
2019-01-02,CZCE,ap,10678
2019-01-02,CZCE,cf,14870
2019-01-02,CZCE,cy,23811
2019-01-02,CZCE,fg,1284
2019-01-02,INE,sc,371
2019-01-03,CFFEX,ic,4062
2019-01-03,CFFEX,if,2962
2019-01-03,CFFEX,ih,2270
2019-01-03,CFFEX,t,98
2019-01-03,CFFEX,tf,99
2019-01-03,CFFEX,ts,100
2019-01-03,DCE,a,3439
2019-01-03,DCE,b,2969
2019-01-03,DCE,bb,134
2019-01-03,DCE,c,1874
2019-01-03,DCE,cs,2340
2019-01-03,DCE,eg,5119
2019-01-03,DCE,fb,84
2019-01-03,DCE,i,501
2019-01-03,DCE,j,1934
2019-01-03,DCE,jd,3488
2019-01-03,DCE,jm,1191
2019-01-03,DCE,l,8459
2019-01-03,DCE,m,2676
2019-01-03,DCE,p,4627
2019-01-03,DCE,pp,8499
2019-01-03,DCE,v,6313
2019-01-03,DCE,y,5506
2019-01-03,SHFE,ag,3763
2019-01-03,SHFE,al,13348
2019-01-03,SHFE,au,290
2019-01-03,SHFE,bu,2628
2019-01-03,SHFE,cu,47326
2019-01-03,SHFE,fu,2396
2019-01-03,SHFE,hc,3337
2019-01-03,SHFE,ni,88385
2019-01-03,SHFE,pb,17804
2019-01-03,SHFE,rb,3429
2019-01-03,SHFE,ru,11485
2019-01-03,SHFE,sn,143901
2019-01-03,SHFE,sp,5067
2019-01-03,SHFE,wr,3560
2019-01-03,SHFE,zn,20071
2019-01-03,CZCE,ap,10476
2019-01-03,CZCE,cf,14846
2019-01-03,CZCE,cy,23679
2019-01-03,CZCE,fg,1302
2019-01-03,CZCE,jr,2853
2019-01-03,CZCE,lr,2634
2019-01-03,CZCE,ma,2439
2019-01-03,CZCE,oi,6526
2019-01-03,CZCE,pm,2268
2019-01-03,CZCE,ri,2430
2019-01-03,CZCE,rm,2142
2019-01-03,CZCE,rs,5405
2019-01-03,CZCE,sf,5735
2019-01-03,CZCE,sm,7408
2019-01-03,CZCE,sr,4678
2019-01-03,CZCE,ta,5677
2019-01-03,CZCE,wh,2411
2019-01-03,CZCE,zc,563
2019-01-03,INE,sc,385"""
dbars = pd.read_csv(StringIO(data), index_col=False, parse_dates=["trade_day"])
dbars = dbars.set_index("trade_day", drop=False) # everything works fine if this line is commented
df = dbars.groupby("exchange").apply(_do_calc_)
print(len(df)) |
any update? seems not fixed in v0.25.1 |
seems fixed in 1.0.0 👍 |
would take a validation test with the above example (or if u can pinpoint an existing test which replicates) |
This was fixed in #28662 7c9042a is the first new commit
|
the code sample above gives the following traceback with 0.25.3 (expand details)
The following snippet follows the same path
and on master ...
|
This looks fixed in 1.1.2
|
Thanks @amy12xx would you like to submit a PR with a test? see #26182 (comment) |
can use code sample in #26182 (comment) as validation test. |
Sure, I can do that. |
Code Sample, a copy-pastable example if possible
Problem description
here is the input data file:
2019.dbar_ftridx.csv.gz
this piece of code runs well with pandas 0.23, when upgraded to 0.24.2, it reports error:
If I do not call set_index() on the dataframe, it works fine. Seems there is something wrong with the DatetimeIndex?
I don't know if this error can be re-produced on your machine, I can re-produce the same error on all my machines.
Expected Output
4104
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]C:\Anaconda3\python.exe D:/test/groupby_bug.py
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: 4.4.0
pip: 19.0.3
setuptools: 41.0.0
Cython: 0.29.7
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.6
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None
The text was updated successfully, but these errors were encountered: