You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
100 loops, best of 3: 40.9 ms per loop
100 loops, best of 3: 1.46 ms per loop
Problem description
The issue here is that the any method for groupby objects seams to be freakishly slow. It is actually better to sum up all the boolean values and do a typecast with .astype(bool). In numpy the operations have similar benchmarks. The method with any is actually faster!.
@daniel-severo The reason for this difference is that for sum we have a specialized groupby version (in cython), and we don't have this for any. So in the case of any, the function is generally applied individually on each group, making it a lot slower.
But, if you or someone would be interested, I don't think it would be too hard to make such a specialized groupby version for any as well.
Problem description
The issue here is that the
any
method forgroupby
objects seams to be freakishly slow. It is actually better to sum up all the boolean values and do a typecast with.astype(bool)
. In numpy the operations have similar benchmarks. The method withany
is actually faster!.pandas: 0.19.2
nose: None
pip: 8.1.1
setuptools: None
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: 2.45.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: