[WIP]: API ExtensionDtype for DTA & TDA #24674

TomAugspurger · 2019-01-08T15:49:30Z

Closes #24662

TODO:

Dedicated tests for new is_ types
Run the extension dtype tests for new dtypes
(maybe?) see if the new EA tests pass for the new arrays. Tricky, since they aren't allowed internally. g
failing dtype validation tests

This is going to fail. Going offline for a bit, but putting this up right now.

I'm going to clean up the changes in internals and the series constructor.

* Public API * tests for common

TomAugspurger · 2019-01-08T19:43:06Z

Still a WIP, but probably not going to have time to update this any further through the end of the week.

jreback · 2019-01-08T19:48:57Z

pandas/core/dtypes/dtypes.py

+        return DatetimeArray
+
+    @classmethod
+    def construct_from_string(cls, string):


I think you need to move this to the sub-class as DatetimeDtype would actually accept this but raise on a construction error (as you can't pass a tz). I would maybe instead just have a unified constructor that accepts tz=None for naive.

jorisvandenbossche · 2019-01-09T13:18:53Z

I would maybe instead just have a unified constructor that accepts tz=None for naive.

@jreback do you also mean just having a single class internally? DatetimeDtype(tz=None) is the naive one and DatetimeDtype(tz=tz) is the aware one (and DatetimeTzDtype can be kept as an alias for now for backwards compatibility)

I didn't really think about any possible technical implications of that, but it seems logical to me to do that, since we also have a single DatetimeArray.

TomAugspurger · 2019-01-09T13:22:50Z

Sorry I put this up in haste and didn't explain things well. Initially, I tried that, but ran into some difficulties. It turns out that having pandas_dtype("M8[ns]") return a non NumPy dtype breaks a lot of stuff. So I split them apart, and don't register the DatetimeDtype.

I'm not sure if it's possible to both

Have a DatetimeDtype that registers datetime64[ns, tz], but not datetime64[ns]
Have construct_from_string always work correctly

but I can try (later, not today).

jreback · 2019-01-09T13:57:46Z

yeah I think this is a great idea @TomAugspurger but also possibly to involve changing some tricky code; would prefer to push this to 0.25

TomAugspurger · 2019-01-14T17:21:55Z

The main downside with pushing to 0.25 is it'll be an API breaking change for DatetimeArray.dtype, and TimedeltaArray.dtype with no path for deprecation.

just pushed an update, but still a WIP.

codecov · 2019-01-15T19:46:22Z

Codecov Report

Merging #24674 into master will decrease coverage by 0.02%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master   #24674      +/-   ##
==========================================
- Coverage   92.38%   92.36%   -0.03%     
==========================================
  Files         166      166              
  Lines       52379    52474      +95     
==========================================
+ Hits        48392    48467      +75     
- Misses       3987     4007      +20

Flag	Coverage Δ
#multiple	`90.78% <88.23%> (-0.03%)`	⬇️
#single	`42.98% <72.94%> (+0.07%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/nanops.py	`94.36% <ø> (ø)`	⬆️
pandas/core/api.py	`100% <ø> (ø)`	⬆️
pandas/core/series.py	`93.7% <100%> (+0.01%)`	⬆️
pandas/core/dtypes/common.py	`96.86% <100%> (+0.08%)`	⬆️
pandas/core/frame.py	`96.93% <100%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`97.78% <100%> (ø)`	⬆️
pandas/core/indexes/timedeltas.py	`90.3% <100%> (+0.08%)`	⬆️
pandas/core/indexes/datetimes.py	`96.28% <100%> (+0.01%)`	⬆️
pandas/core/indexes/base.py	`96.3% <100%> (ø)`	⬆️
pandas/core/internals/construction.py	`94% <60.86%> (-1.94%)`	⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 512830b...2abda43. Read the comment docs.

codecov · 2019-01-15T19:46:22Z

Codecov Report

Merging #24674 into master will decrease coverage by 49.41%.
The diff coverage is 73.65%.

@@             Coverage Diff             @@
##           master   #24674       +/-   ##
===========================================
- Coverage   92.38%   42.97%   -49.42%     
===========================================
  Files         166      166               
  Lines       52366    52469      +103     
===========================================
- Hits        48380    22550    -25830     
- Misses       3986    29919    +25933

Flag	Coverage Δ
#multiple	`?`
#single	`42.97% <73.65%> (+0.06%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/nanops.py	`31.27% <ø> (-63.1%)`	⬇️
pandas/core/api.py	`86.66% <ø> (-13.34%)`	⬇️
pandas/core/series.py	`49.03% <100%> (-44.66%)`	⬇️
pandas/core/dtypes/common.py	`71.15% <100%> (-25.63%)`	⬇️
pandas/core/frame.py	`35.82% <100%> (-61.11%)`	⬇️
pandas/core/arrays/datetimes.py	`66.13% <100%> (-31.65%)`	⬇️
pandas/core/indexes/timedeltas.py	`44.87% <100%> (-45.38%)`	⬇️
pandas/core/indexes/datetimes.py	`48.62% <100%> (-47.65%)`	⬇️
pandas/core/indexes/base.py	`56.49% <100%> (-39.82%)`	⬇️
pandas/core/internals/construction.py	`62.11% <60.86%> (-33.83%)`	⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bd454c...f1ecb8b. Read the comment docs.

TomAugspurger · 2019-01-16T15:17:42Z

Somewhat awkward API corner: what's the expected types for

DatetimeArray.astype("M8[ns]")
DatetimeArray.astype(np.dtype("M8[ns]"))
DatetimeArray.astype(pd.DatetimeDtype())

Previously, we'd use .astype('M8[ns]') to convert to an ndarray. But now that we have a dtype roughly equivalent to M8[ns], should we keep doing that?

Another one, what should we do about pd.DatetimeDtype() == "M8[ns]" or == np.dtpye("M8[ns]").

NumPy doesn't make this easy, since they would raise if the operands were flipped, rather than returning NotImplemented.

jorisvandenbossche · 2019-01-17T15:47:26Z

For the expected types, I also have been wondering about this in general. So eg also for DTA.astype("int64").
Because we now have PandasArray and PandasDtype, so we could also return those? Similarly as pd.array(DTA, dtype='int64') does return that, we could do the same for EA.astype. It could be a specification of EA.astype that the return value is always another EA.

TomAugspurger · 2019-01-17T16:12:56Z

Yeah, it's not obvious to me what's best here.

On the one hand, you could say that .astype(np.dtype(.)) should always return an ndarray.

On the other hand, is it weird for .astype(np.dtype("int64")) to not match .astype("int64")? I suppose we'll need to sort all this out.

I'm really not sure what to do here :/

h-vetinari · 2019-01-20T23:05:16Z

@TomAugspurger: Yeah, it's not obvious to me what's best here.
On the one hand, you could say that .astype(np.dtype(.)) should always return an ndarray.
On the other hand, is it weird for .astype(np.dtype("int64")) to not match .astype("int64")? I suppose we'll need to sort all this out.
I'm really not sure what to do here :/

How about .astype and pd.array always returning PandasDType (yes, even for an np.dtype)? If the user really wants an ndarray, they can then call to_numpy. That would in many ways be more consistent - and could later be extended to the point that the only way to get an ndarray from pandas is to call to_numpy.

jbrockmendel · 2019-07-28T04:35:45Z

This has been inactive for a while. Still a goal?

TomAugspurger · 2019-07-29T15:21:36Z

I'm not sure. The API changes are tricky to get right.

Regardless, I don't think things will be done on this branch. I'll post an update on the issue.

TomAugspurger and others added 4 commits January 8, 2019 09:45

API: dtype for DTA & TDA

88be531

fixed tests

2dcfefc

Updates

8796aad

* Public API * tests for common

updates

8b644c9

jreback requested changes Jan 8, 2019

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 8, 2019

TomAugspurger added 2 commits January 14, 2019 11:02

Merge remote-tracking branch 'upstream/master' into array-dtypes

4416237

fixups

9b867e8

TomAugspurger added 2 commits January 15, 2019 10:10

Merge remote-tracking branch 'upstream/master' into array-dtypes

55f36a9

Fixups

f1ecb8b

TomAugspurger added 4 commits January 16, 2019 08:51

32-bit compat

fa07454

Merge remote-tracking branch 'upstream/master' into array-dtypes

cb1291b

tests

fe0bb95

move

fdadf12

Merge remote-tracking branch 'upstream/master' into array-dtypes

2abda43

TomAugspurger mentioned this pull request Jan 20, 2019

RLS: 0.24.0 #24060

Closed

jorisvandenbossche mentioned this pull request Jan 22, 2019

API: ExtensionArray.astype to only return extension arrays ? #24877

Open

TomAugspurger closed this Jul 29, 2019

TomAugspurger mentioned this pull request Jul 29, 2019

dtype for tz-naive DatetimeArray and TimedeltaArray #24662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP]: API ExtensionDtype for DTA & TDA #24674

[WIP]: API ExtensionDtype for DTA & TDA #24674

Uh oh!

TomAugspurger commented Jan 8, 2019

Uh oh!

TomAugspurger commented Jan 8, 2019

Uh oh!

jreback Jan 8, 2019

Uh oh!

jorisvandenbossche commented Jan 9, 2019

Uh oh!

TomAugspurger commented Jan 9, 2019

Uh oh!

jreback commented Jan 9, 2019

Uh oh!

TomAugspurger commented Jan 14, 2019

Uh oh!

codecov bot commented Jan 15, 2019 •

edited

Loading

Uh oh!

codecov bot commented Jan 15, 2019

Uh oh!

TomAugspurger commented Jan 16, 2019

Uh oh!

jorisvandenbossche commented Jan 17, 2019 •

edited

Loading

Uh oh!

TomAugspurger commented Jan 17, 2019

Uh oh!

h-vetinari commented Jan 20, 2019

Uh oh!

jbrockmendel commented Jul 28, 2019

Uh oh!

TomAugspurger commented Jul 29, 2019

Uh oh!

Uh oh!

Uh oh!

[WIP]: API ExtensionDtype for DTA & TDA #24674

[WIP]: API ExtensionDtype for DTA & TDA #24674

Uh oh!

Conversation

TomAugspurger commented Jan 8, 2019

Uh oh!

TomAugspurger commented Jan 8, 2019

Uh oh!

jreback Jan 8, 2019

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jan 9, 2019

Uh oh!

TomAugspurger commented Jan 9, 2019

Uh oh!

jreback commented Jan 9, 2019

Uh oh!

TomAugspurger commented Jan 14, 2019

Uh oh!

codecov bot commented Jan 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Jan 15, 2019

Codecov Report

Uh oh!

TomAugspurger commented Jan 16, 2019

Uh oh!

jorisvandenbossche commented Jan 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Jan 17, 2019

Uh oh!

h-vetinari commented Jan 20, 2019

Uh oh!

jbrockmendel commented Jul 28, 2019

Uh oh!

TomAugspurger commented Jul 29, 2019

Uh oh!

Uh oh!

codecov bot commented Jan 15, 2019 •

edited

Loading

jorisvandenbossche commented Jan 17, 2019 •

edited

Loading