-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Cannot create third-party ExtensionArrays for datetime types (xfail) #34987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is just the failing test for now, happy to implement a fix if someone could tell me the location where this should be fixed. |
from .arrays import ArrowTimestampUSArray # isort:skip | ||
|
||
|
||
def test_constructor_extensionblock(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be xfailed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can xfail this, so this can be merged. I would prefer to fix this myself though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just need a pointer at which code section I should apply a fix. Should I change the order in pandas/pandas/core/internals/blocks.py
so that we only create a DatetimeTZBlock
for pandas-provided datetime-based ExtensionArrays or shouldn't is_datetime64tz_dtype
return True
for my ExtensionDtype
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to fix this myself though.
Sounds good. Is this a use case you have a need to get working near-term, or more of a Principle Of The Thing? I ask because...
I just need a pointer at which code section I should apply a fix.
This is pretty daunting, as I expect this is scattered across the code. There are lots of places where we either a) implicitly assume nanoseconds or b) check dtype.kind in ["M", "m"]
(much more performant than the is_foo_dtype
checks)
Should I change the order in pandas/pandas/core/internals/blocks.py so that we only create a DatetimeTZBlock for pandas-provided datetime-based ExtensionArrays
That will probably be part of a solution.
or shouldn't is_datetime64tz_dtype return True for my ExtensionDtype?
I'd be very reticent to make that change, since I think a lot of code expects that to imply its getting our Datetime64TZDtype. Maybe a is_3rd_party_ea_dtype
that we would check for before checking for any 1st-party dtypes? That runs into the "ideally we should treat 3rd party EAs symmetrically with 1st-party" problems.
So getting back to the motivation: how high a priority is this?
One thing I can unambiguously encourage is more tests, even if xfailed:
- what happens if you pass one of these to the DatetimeIndex constructor? vice-versa?
- what happens if i do DatetimeIndex.astype(this_new_ea_dtype)
- addition/subtraction with the gamut of datetime/timedelta scalars/arrays we already support?
- How does this behave if you stuff it inside a Categorical/CategoricalIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to fix this myself though.
Sounds good. Is this a use case you have a need to get working near-term, or more of a Principle Of The Thing? I ask because...
More in the next 6 months range, thus I'm definitely going to add an xfail
here as the points below indicate that we should rather think more than "fix quick".
I would love to have a nullable, non-nanosecond timestamp (actually I desparately need it but e.g. having a performant string is more important to me) but there are several other places that either assume that all timestamps are nanoseconds or backed by a numpy-array, so this is going to be a major effort.
or shouldn't is_datetime64tz_dtype return True for my ExtensionDtype?
I'd be very reticent to make that change, since I think a lot of code expects that to imply its getting our Datetime64TZDtype. Maybe a
is_3rd_party_ea_dtype
that we would check for before checking for any 1st-party dtypes? That runs into the "ideally we should treat 3rd party EAs symmetrically with 1st-party" problems.So getting back to the motivation: how high a priority is this?
As already pointed out: Less than other things I want to contribute to pandas
, so xfail
ing and adding more (possibly) xfailing tests is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so xfailing and adding more (possibly) xfailing tests is the way to go.
Sounds good.
actually I desparately need [...] that either assume that all timestamps are nanoseconds or backed by a numpy-array
Would your need be solved if we get numpy-backed non-nano in place? There's a reasonable chance of that happening in the next 6 months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would your need be solved if we get numpy-backed non-nano in place? There's a reasonable chance of that happening in the next 6 months.
For now: Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now: Yes.
I'm slowly tackling this from the cython side of the code. The parallelizable step is to comb through the rest of the code to find all the places where we implicitly/explicitly assume nanos. I'd start with pandas/plotting and pandas/io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets see if we can at least get this one working.
i think we'll need to edit the dtype.kind check in is_datetime64tz_dtype, and possible the issubclass(vtype, np.datetime64)
check in internals.blocks.get_block_type
xfail added, CI is now happy. |
@@ -67,6 +68,26 @@ def construct_array_type(cls) -> Type["ArrowStringArray"]: | |||
return ArrowStringArray | |||
|
|||
|
|||
@register_extension_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put these in the test file for now as I am not sure we agree on these names (and is just used for testing ATM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move them but I wanted to keep the dtype here as done for the other test-Arrow-dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, CI passed except the Docs but the warning about missing sparse methods are unrelated to this PR.
can you merge master and we'll see if we can get this in |
fd562df
to
6d92caa
Compare
@jbrockmendel Rebased and all green except one Windows job that timeouted. |
I think the edit to get_block_type in #34683 might fix the test that fails here. can you confirm? if that is fixed, presumably the rest of the EA test suite still needs to be enabled for this EA? |
Yes, merging in #34683 fixes the test. I'm not sure whether it would be really worth to get the full suite running for this test EA. It is basically here to check for the regression but getting the whole suite to pass would be a lot more work that I don't see worthwhile currently. |
totally reasonable. i guess we can merge this now and then if/when #34683 makes this pass we can revisit getting other bits working. cc @jreback |
@xhochy can you merge master, hopefully we'll get the CI green and can get this in |
877c401
to
e393ca6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@jbrockmendel @jreback Rebased and removed xfail as it is working now. |
import pandas as pd | ||
from pandas.api.extensions import ExtensionDtype, register_extension_dtype | ||
|
||
pytest.importorskip("pyarrow", minversion="0.13.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could technicaly be later but ok for now
thanks @xhochy |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff