-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: simplify pyarrow tests, make mode work with temporal dtypes #50688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def test_getitem_scalar(self, data): | ||
super().test_getitem_scalar(data) | ||
# In the base class we expect data.dtype.type; but this (intentionally) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke might it make sense to reconsider this? im not sure what context the current PyArrow.lib.DataType
is really useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, yeah I am not exactly sure how ExtensionDtype.type
is used internally, but I think it would make sense if it returns the native python type. Would be nice if there was a nice way to derive that from numpy_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main use case that comes to mind is if i want to do e.g. dtype.type(0)
to get an object of the correct type
pandas/core/arrays/arrow/array.py
Outdated
if pa.types.is_temporal(pa_type): | ||
nbits = pa_type.bit_width | ||
dtype = f"int{nbits}[pyarrow]" | ||
obj = cast(ArrowExtensionArrayT, self.astype(dtype, copy=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just self._data.cast
to int type, continue the logic below, and then most_common.cast
back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could. i preferred this way since it was self-contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect going through pa.array.cast
may be more performant than astype
. Mind checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated + green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like _mode here is still using astype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in some cases this needs an int32 and in some cases an int64, which makes the .cast version extra-complicated
If I am understanding apache/arrow#33685, then in pyarrow 11 we won't have to do the the int64 workaround? |
I don't know if that will affect mode directly. |
updated to use .cast instead of .astype |
Thanks @jbrockmendel |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.