-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Cannot create third-party ExtensionArrays for datetime types (xfail) #34987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import datetime | ||
from typing import Type | ||
|
||
import pytest | ||
|
||
import pandas as pd | ||
from pandas.api.extensions import ExtensionDtype, register_extension_dtype | ||
|
||
pytest.importorskip("pyarrow", minversion="0.13.0") | ||
|
||
import pyarrow as pa # isort:skip | ||
|
||
from .arrays import ArrowExtensionArray # isort:skip | ||
|
||
|
||
@register_extension_dtype | ||
class ArrowTimestampUSDtype(ExtensionDtype): | ||
|
||
type = datetime.datetime | ||
kind = "M" | ||
name = "arrow_timestamp_us" | ||
na_value = pa.NULL | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@classmethod | ||
def construct_array_type(cls) -> Type["ArrowTimestampUSArray"]: | ||
""" | ||
Return the array type associated with this dtype. | ||
|
||
Returns | ||
------- | ||
type | ||
""" | ||
return ArrowTimestampUSArray | ||
|
||
|
||
class ArrowTimestampUSArray(ArrowExtensionArray): | ||
def __init__(self, values): | ||
if not isinstance(values, pa.ChunkedArray): | ||
raise ValueError | ||
|
||
assert values.type == pa.timestamp("us") | ||
self._data = values | ||
self._dtype = ArrowTimestampUSDtype() | ||
|
||
|
||
def test_constructor_extensionblock(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be xfailed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can xfail this, so this can be merged. I would prefer to fix this myself though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just need a pointer at which code section I should apply a fix. Should I change the order in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sounds good. Is this a use case you have a need to get working near-term, or more of a Principle Of The Thing? I ask because...
This is pretty daunting, as I expect this is scattered across the code. There are lots of places where we either a) implicitly assume nanoseconds or b) check
That will probably be part of a solution.
I'd be very reticent to make that change, since I think a lot of code expects that to imply its getting our Datetime64TZDtype. Maybe a So getting back to the motivation: how high a priority is this? One thing I can unambiguously encourage is more tests, even if xfailed:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
More in the next 6 months range, thus I'm definitely going to add an I would love to have a nullable, non-nanosecond timestamp (actually I desparately need it but e.g. having a performant string is more important to me) but there are several other places that either assume that all timestamps are nanoseconds or backed by a numpy-array, so this is going to be a major effort.
As already pointed out: Less than other things I want to contribute to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Would your need be solved if we get numpy-backed non-nano in place? There's a reasonable chance of that happening in the next 6 months. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For now: Yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm slowly tackling this from the cython side of the code. The parallelizable step is to comb through the rest of the code to find all the places where we implicitly/explicitly assume nanos. I'd start with pandas/plotting and pandas/io. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets see if we can at least get this one working. i think we'll need to edit the dtype.kind check in is_datetime64tz_dtype, and possible the |
||
# GH 34986 | ||
pd.DataFrame( | ||
{ | ||
"timestamp": ArrowTimestampUSArray.from_scalars( | ||
[None, datetime.datetime(2010, 9, 8, 7, 6, 5, 4)] | ||
) | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could technicaly be later but ok for now