Skip to content

BUG: [pyarrow] Division of Timedelta("-1 days +23:59:59") by Timedelta("1s") results in NaN #56409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
randolf-scholz opened this issue Dec 8, 2023 · 4 comments
Closed
2 of 3 tasks
Labels
Arrow pyarrow functionality Bug Timedelta Timedelta data type Upstream issue Issue related to pandas dependency

Comments

@randolf-scholz
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import Series, Timedelta

unit = Timedelta("1s")
time = Series([Timedelta("0 days 00:00:00"), Timedelta("-1 days +23:59:59")])
time_arrow = time.astype("duration[us][pyarrow]")

print(time / unit)  # [0, -1]
print(time_arrow / unit)  # [0, NaN]

Issue Description

When using pyarrow duration dtype, we get incorrect result when performing division.

Expected Behavior

The results should be identical floats, independent of the backend.

Installed Versions

INSTALLED VERSIONS

commit : a671b5a
python : 3.11.6.final.0
python-bits : 64
OS : Linux
OS-release : 6.2.0-37-generic
Version : #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.4
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 69.0.2
pip : 23.3.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.12.1
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@randolf-scholz randolf-scholz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2023
@mroeschke
Copy link
Member

Thanks for the report. This looks like the direct result of the pyarrow compute function

(Pdb) self._pa_array
<pyarrow.lib.ChunkedArray object at 0x13da9b6f0>
[
  [
    0,
    -1000000
  ]
]
(Pdb) other
<pyarrow.DurationScalar: Timedelta('0 days 00:00:01')>
(Pdb) pc.divide(self._pa_array, other)
<pyarrow.lib.ChunkedArray object at 0x13db037e0>
[
  [
    0,
    nan
  ]

I suspect pyarrow might be casting other to 0 resulting in a divide by 0. Could you open an issue upstream?

@mroeschke mroeschke added Timedelta Timedelta data type Upstream issue Issue related to pandas dependency Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2023
@rohanjain101
Copy link
Contributor

Possibly related to apache/arrow#38566

@randolf-scholz
Copy link
Contributor Author

Opened apache/arrow#39156, seems that division with any negative duration leads to NAN

@randolf-scholz
Copy link
Contributor Author

Fixed with pyarrow 15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Timedelta Timedelta data type Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants