You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdimportnumpyasnpn=80000g=5index=pd.MultiIndex.from_product([
np.arange(g),
pd.to_timedelta(np.arange(n), unit='s')
])
data=pd.DataFrame(
np.random.randint(0,1000,size=(len(index))),
index=index
)
%timeitdata.groupby(level=0).resample('10s',level=1).mean()
# 3.93 s ± 295 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeitdata.reset_index(1).groupby(level=0).resample('10s',on='level_1').mean()
# 157 ms ± 3.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Problem description
resample seem to take much more time when resampling on a level of MultiIndex instead of normal data column. The second, faster approach is more convoluted and is not what first comes to mind.
Expected Output
Both operations should around the same amount of time with second possibly slightly more, because of additional reset_index operation. If the difference is expected than first operation should show warning hinting on optimal solution.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
The text was updated successfully, but these errors were encountered:
pgrudzinski
changed the title
Resample on MultiIndex level is much longer than on normal column
Resample on MultiIndex level takes much more time than on normal column
Oct 6, 2019
Code Sample
Problem description
resample
seem to take much more time when resampling on a level of MultiIndex instead of normal data column. The second, faster approach is more convoluted and is not what first comes to mind.Expected Output
Both operations should around the same amount of time with second possibly slightly more, because of additional reset_index operation. If the difference is expected than first operation should show warning hinting on optimal solution.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
The text was updated successfully, but these errors were encountered: