-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Timedelta multiplication crashes for large arrays #31457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report
Could you include the error message in the issue please? |
Looks like this is raising inside a numexpr call, shouldn't be too hard to fix |
Taking a look at this now. |
Oh, it seems that numexpr doesn't support timedeltas (https://numexpr.readthedocs.io/en/latest/user_guide.html#datatypes-supported-internally) so we shouldn't even be taking that path. |
@jbrockmendel do you have a suggestion on where we should bail out of numexpr? Seems like |
That's a reasonable place to do it. The others that come to mind are
If no one beats me to it, I expect I'll use the L189 option. |
I'd much prefer the first option, rather than catching a ValueError. Also, before I open an issue about it, do you have thoughts on whether we should only allow (numpy) arrays and scalars in |
I dont have a strong opinion about the scalars, but we should definitely avoid getting to evaluate_numexpr with Series/Index/DataFrame. The most obvious offender here is dispatch_to_series which I think doesn't need to be calling expressions.evaluate. Ideally I think some of the checks for can-we-use-numexpr should make their way to the Block level, since that can cache/skip some of the length/dtype checks. |
Code Sample, a copy-pastable example if possible
Traceback:
Problem description
This works if the series is initialised with "large arrays" (len > 10000). It seems it crashes just with timedeltas, but not with float/int values instead of
delta_t
.I managed to get around this error by calling:
It worked fine until pandas 0.25.3
Expected Output
0 0 days 00:00:00
1 0 days 00:30:00
2 0 days 01:00:00
3 0 days 01:30:00
4 0 days 02:00:00
...
9996 208 days 06:00:00
9997 208 days 06:30:00
9998 208 days 07:00:00
9999 208 days 07:30:00
10000 208 days 08:00:00
Length: 10001, dtype: timedelta64[ns]
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : 0.29.14
pytest : 5.3.4
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.4
pyxlsb : None
s3fs : 0.4.0
scipy : 1.3.2
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0
The text was updated successfully, but these errors were encountered: