-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: datetime ExtensionDtype do not work with DataFrame #35767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related to #35762 |
@marco-neumann-by thanks for the report! I know this is messy in pandas itself (with DatetimeArray sometimes having a numpy dtype and sometimes a DatetimeTZDtype), but in principle your extension array should always have an ExtensionDtype. If that is not the case, pandas cannot guarantee that it will work (or rather it is guaranteed to not work, since we rely on the dtype for several aspects of EA functionality). I suppose in this case the RLEArray is a wrapper of other ExtensionArrays, and therefore wrapping DatetimeArray? But the |
@sbrugman I think it are two different issues. The one you linked is specifically related to the Sparse dtype (and the problem that Sparse[datetime64] is apparently regarded as a datetime-like dtype), which is a special case in pandas and is not very well tested or supported. But it is not related to DatetimeArray being a non-proper ExtensionArray with a numpy dtype (in case of tz-naive data). |
@jorisvandenbossche |
Ah, I missed the difference between I am not fully sure what the reason is the check being written like it is now. We should probably test what fails if we change it to a more strict check for |
I had a look at the code in
So I think either |
get_block_type has been re-worked since the OP. Does the issue persist? |
I have checked that this issue has not already been reported. (at least I couldn't find one)
I have confirmed this bug exists on the latest version of pandas. (1.1.0)
(optional) I have confirmed this bug exists on the master branch of pandas. (
934e9f840ebd2e8b5a5181b19a23e033bd3985a5
)Code Sample, a copy-pastable example
This is some high-level example that lead to the investion. It relies on
rle-array
(commitdfa79295a580d533ee9d2ea901e8808496dbcdc9
was used), because the pandas-providedDatetimeArray
uses a NumPy dtype orDatetimeTZDtype
. Both cases somewhat work (see "Problem description").Problem description
See here:
pandas/pandas/core/internals/blocks.py
Lines 2647 to 2690 in 934e9f8
datetime (and also interval) types are checked BEFORE extension types which means that extension datetime types never end up in
ExtensionBlock
s. The latter one would be useful if:rle-array
case)Furthermore the invariant
issubclass(vtype, np.datetime64) => not is_datetime64tz_dtype(values.dtype)
does NOT hold for all extension dtypes, at least not under the current implementation ofis_datetime64tz_dtype
:pandas/pandas/core/dtypes/common.py
Lines 415 to 421 in 934e9f8
Expected Output
The code example works and
df._data
shows that the data ends up in anExtensionBlock
.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: