-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fixed Series.dt methods in ArrowDtype class that were returning incorrect values #57355 #58052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fixed Series.dt methods in ArrowDtype class that were returning incorrect values #57355 #58052
Conversation
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
c7099e4
to
ddf075c
Compare
pandas/core/arrays/arrow/array.py
Outdated
@@ -2644,7 +2644,11 @@ def _dt_minutes(self) -> Self: | |||
def _dt_seconds(self) -> Self: | |||
return type(self)( | |||
pa.array( | |||
self._to_timedeltaarray().seconds, from_pandas=True, type=pa.int32() | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is not where we want to fix things - doing a Python loop here would be a significant slow down compared to the non-arrow case.
I think the problem is that the Arrow duration does not have the same storage as the CPython timedelta, yet we use the CPython API to try and extract those components in pandas/_libs/tslibs/timedeltas.pyx
I vaguely recall this being a conversation with the larger team before but can't remember specifics. Maybe @jbrockmendel or @jorisvandenbossche knows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like it matches what we get for the non-pyarrow .dt.seconds, so i dont think we'd want to change it. only in .dt.components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I thought that since a loop was already being done for the other properties (hours, minutes and milliseconds) then it would be reasonable to do it for the faulty ones too. However it is indeed a slow down and i will try to find another way to fix this issue. Thank you for your time.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
ddf075c
to
8476683
Compare
…incorrect time values. (pandas-dev#57355)
8476683
to
61379f7
Compare
I apologize for the time i took to adress this issue. In pyarrow, the On the other hand, taking into consideration @jbrockmendel 's suggestion, i think that since the components are being generated via the methods i've changed the option here would be either changing how components are generated or creating another method that would return the hours, minutes and seconds in seconds just like in timedelta. In my opinion, having I might be wrong or have missed something, so i would be very grateful if you could guide me through it if that's the case. |
Thanks @St0rmie and sorry for the delay here |
seconds
attribute of.dt.components
for pyarrow timestamp datatype seem faulty #57355doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.Fixed a bug where using the
Series.dt
methods forArrowDtype
would return incorrect time values and added a test to prevent that.Reproducible example:
Running the above example on main branch would give the following output:
By running the example with this fix, i get the following output: