-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
WIP: make _holder changeover #24540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: make _holder changeover #24540
Conversation
Haven't looked here yet, but my vote it to stop splitting things off. I
have some followups to the other PR that I'd like to get to that are
currently blocked.
…On Tue, Jan 1, 2019 at 5:21 PM jbrockmendel ***@***.***> wrote:
cc @jreback <https://github.com/jreback> @TomAugspurger
<https://github.com/TomAugspurger>
changes _holder from DTI/TDI to DTA/TDA, pretty much all the diff from
blocks.internals and core.dtypes
Not yet green locally, but I'm presenting this as a viable alternative to
the all-at-once still in #24024
<#24024>
------------------------------
You can view, comment on, or merge this pull request online at:
#24540
Commit Summary
- make _holder changeover
File Changes
- *M* pandas/core/dtypes/common.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-0> (5)
- *M* pandas/core/dtypes/concat.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-1> (14)
- *M* pandas/core/dtypes/dtypes.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-2> (16)
- *M* pandas/core/dtypes/generic.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-3> (2)
- *M* pandas/core/internals/blocks.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-4> (201)
- *M* pandas/tests/dtypes/test_dtypes.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-5> (17)
- *M* pandas/tests/extension/test_common.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-6> (8)
- *M* pandas/tests/internals/test_internals.py
<https://github.com/pandas-dev/pandas/pull/24540/files#diff-7> (18)
Patch Links:
- https://github.com/pandas-dev/pandas/pull/24540.patch
- https://github.com/pandas-dev/pandas/pull/24540.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIoDb4LM389Hpsq_H6DxY2WhgSB2lks5u--1lgaJpZM4ZmEUK>
.
|
I hear where you're coming from (and am definitely ready to spend some time away from DTA/TDA), but I don't see a resolution to the constructor discussion, and that's a blocker for me on #24024. |
An actual blocker, as in you'll -1 the merge until it's addressed?
FYI, I think the handling of i8values has been addressed. Working on freq
validation now.
…On Tue, Jan 1, 2019 at 5:27 PM jbrockmendel ***@***.***> wrote:
I hear where you're coming from (and am definitely ready to spend some
time away from DTA/TDA), but I don't see a resolution to the constructor
discussion, and that's a blocker for me on #24024
<#24024>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHItV-LUOO6fZ_oSO5pn4d3uByEEteks5u--7OgaJpZM4ZmEUK>
.
|
I know I'm not the one who needs to be convinced, but yah.
That would totally get me to stop complaining! |
Any pandas maintainer can block a PR from being merged with a -1.
…On Tue, Jan 1, 2019 at 5:31 PM jbrockmendel ***@***.***> wrote:
An actual blocker, as in you'll -1 the merge until it's addressed?
I know I'm not the one who needs to be convinced, but yah.
FYI, I think the handling of i8values has been addressed. Working on freq
validation now.
That would totally get me to stop complaining!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkZhJTMDBn-Icqq6eulvauPCIzb_ks5u--_GgaJpZM4ZmEUK>
.
|
well, mostly. There should still be a private, inference-free, validation-free constructor. We could call it "easy_init" |
I don't think that's possible unless we use `__new__`, and I don't want the
complexity that entails.
…On Tue, Jan 1, 2019 at 5:36 PM jbrockmendel ***@***.***> wrote:
well, mostly. There should still be a private, inference-free,
validation-free constructor. We could call it "easy_init"
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvN51HeGWDezTit74THPRDFYJGUbks5u-_EIgaJpZM4ZmEUK>
.
|
I've gotten the test failures down to almost entirely pickle-based. @TomAugspurger IIRC you had to solve that in the other branch. Was there a trick there? |
Define |
Yep, that did it. And plays nicely with _simple_new to boot. |
Why can't we simply make the DTA (et all) |
That's basically what we do on #24024. The permitted types for data
* ndarray[i8]
* ndarray[datetime64[ns]]
* DatetimeArray
* Series / Index of the above
This means that `__init__` is always no-copy (unless explicitly requested
via the `copy` param). At most we do a view or an unboxing.
…On Tue, Jan 1, 2019 at 6:26 PM Jeff Reback ***@***.***> wrote:
well, mostly. There should still be a private, inference-free,
validation-free constructor. We could call it "easy_init"
Why can't we simply make the DTA (et all) __init__ just be unfriendly if
you don't pass accordning the guarantees; yes they will be public but will
have very constrained inputs. ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvjKFuUmew9ydrhB0zd2iabfvLIpks5u-_yvgaJpZM4ZmEUK>
.
|
My immediate claim is that this discussion is large enough that it should be separated out from #24024. That is the purpose of this PR. The answer to the question: because it's public! because people will use it! because having it behave differently from *possibly out of date, but even if so, there should be non-validating (performant) private constructor! |
ok, I am fine with exploring alternative ways of doing this. But at this point, would be more productive to merge #24024 yes? |
If the concensus is that smaller bits-for-better-review is no longer worthwhile, then we can close this and I'll do one more mondo pass on 24024. That said, the only tests that are still failing here are in test_packers, so if we can fix those and do this indepndently, we cut 40% off the remaining diff in 24024 |
@jbrockmendel #24540 (comment) ok let's do that (merge this), then rebase and merge #24024 |
I like the attitude, but a lot of the point of smaller pieces is making it feasible to fine-tooth-comb it. |
@@ -228,6 +228,11 @@ static PyObject *get_values(PyObject *obj) { | |||
PRINTMARK(); | |||
|
|||
if (values && !PyArray_CheckExact(values)) { | |||
|
|||
if (PyObject_HasAttrString(values, "to_numpy")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here about what this is doing, and . TODO to see if we can remove the .values below
class DatetimeTZDtype(PandasExtensionDtype): | ||
|
||
@register_extension_dtype | ||
class DatetimeTZDtype(PandasExtensionDtype, ExtensionDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can now move PandasExtensionDtype
to subclass ExtensionDtype
, see the note in pandas.core.dtypes.base for _DtypeMixin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good as a followup item I think. IIRC I created an issue when I had to create PandasExtensionDtype in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i agree, do we have an issue about fixing this?
def get_values(self, dtype=None): | ||
""" | ||
return object dtype as boxed values, such as Timestamps/Timedelta | ||
""" | ||
if is_object_dtype(dtype): | ||
return lib.map_infer(self.values.ravel(), | ||
self._box_func).reshape(self.values.shape) | ||
values = self.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't you always call .ravel()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For starters because DTA/TDA doesn't have ravel() (though I think it should (ditto transpose) that just returns self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, can you make an issue about this
|
||
values = lib.map_infer(values, self._box_func) | ||
|
||
if self.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this reshape always true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. The gymnastics in internals to get around the fact that we can't reshape to (nrows, 1) is over my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it was ok before, puzzled why it needs changing now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Series ndim
will be 1
In [2]: ser = pd.Series(pd.date_range('2000', periods=4, tz="UTC"))
In [3]: ser._data.blocks[0].ndim
Out[3]: 1
Or perhaps I misunderstood the question.
@@ -2198,13 +2216,15 @@ class TimeDeltaBlock(DatetimeLikeBlockMixin, IntBlock): | |||
def __init__(self, values, placement, ndim=None): | |||
if values.dtype != _TD_DTYPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory this is not needed anymore
@@ -2771,6 +2794,10 @@ def _maybe_coerce_values(self, values): | |||
""" | |||
if values.dtype != _NS_DTYPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above
@@ -2885,12 +2914,16 @@ def set(self, locs, values, check=False): | |||
|
|||
self.values[locs] = values | |||
|
|||
def external_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be on EA base block? (parmetrized on the dtype)
value, limit=limit, inplace=inplace, downcast=downcast | ||
) | ||
|
||
def setitem(self, indexer, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be in the EA block class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but I don't think we have a good story for casting extension arrays yet. I'd prefer to keep that behavior undefined for now, until we can do better than "fall back to object".
@@ -414,6 +414,13 @@ def _formatter(self, boxed=False): | |||
# ---------------------------------------------------------------- | |||
# Array-Like / EA-Interface Methods | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well add ravel for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid adding these if possible.
What requires this? Was it for pytables IO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transpose was for a pytables test, yah. we discussed it briefly over in 24024. I prefer adding it, but not gonna fuss over it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also until i get the packers tests and one last feather test fixed I'm completely stuck)
@@ -240,11 +240,14 @@ def _simple_new(cls, values, freq=None, tz=None): | |||
result._dtype = dtype | |||
return result | |||
|
|||
def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False, | |||
def __init__(self, values, freq=None, tz=None, dtype=None, copy=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init should be values, freq, dtype, copy
to match PeriodArray.
@@ -1568,6 +1573,8 @@ def sequence_to_dt64ns(data, dtype=None, copy=False, | |||
copy = False | |||
elif isinstance(data, ABCSeries): | |||
data = data._values | |||
elif isinstance(data, ABCPandasArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if this is necessary? Why not fall through to the usual sequence handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a test in tests.series.test_missing that ended up calling lib.infer_dtype and raising an AttributeError because PandasDType doesn't have a base
attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That probably indicates a bug that this is hiding. Shouldn't all non ndarray sequences be converted to ndarrays before calling infer_dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i seem to recall something about extension dtypes being registered in such a way that infer_dtype could pick up on them. will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
infer_dtype
does not know about extension dtypes per se (and cannot because the point is to figure out whats hiding behind a dtype (mostly object
), so dispatching based on dtype is not possible).
The only ones that are directly implemented are Period
and Interval
, but no other EA type, including Datetime64TZ
.
class DatetimeTZDtype(PandasExtensionDtype): | ||
|
||
@register_extension_dtype | ||
class DatetimeTZDtype(PandasExtensionDtype, ExtensionDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good as a followup item I think. IIRC I created an issue when I had to create PandasExtensionDtype in the first place.
|
||
values = lib.map_infer(values, self._box_func) | ||
|
||
if self.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Series ndim
will be 1
In [2]: ser = pd.Series(pd.date_range('2000', periods=4, tz="UTC"))
In [3]: ser._data.blocks[0].ndim
Out[3]: 1
Or perhaps I misunderstood the question.
value, limit=limit, inplace=inplace, downcast=downcast | ||
) | ||
|
||
def setitem(self, indexer, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but I don't think we have a good story for casting extension arrays yet. I'd prefer to keep that behavior undefined for now, until we can do better than "fall back to object".
|
||
class TestGroupby(BaseDatetimeTests, base.BaseGroupbyTests): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like the packers tests are resolved if I change |
I'm looking into this right now. I'd going to try to crack it tonight, but
I'm pretty confused at the moment. I'll post if I give up for the night.
…On Tue, Jan 1, 2019 at 9:31 PM jbrockmendel ***@***.***> wrote:
So it looks like the packers tests are resolved if I change
DatetimeArray.__init__ to treat M8[ns] values like int64 values, which
I'd really rather not do. Any thoughts on another way to resolve this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImc6epNMD2IN_jhpbxdqwfZL5K-Aks5u_CgngaJpZM4ZmEUK>
.
|
A `pd.date_range` apparently calls `DatetimeArray.__init__` four times,
which I hope isn't necessary.
On Tue, Jan 1, 2019 at 9:35 PM Tom Augspurger <[email protected]>
wrote:
… I'm looking into this right now. I'd going to try to crack it tonight, but
I'm pretty confused at the moment. I'll post if I give up for the night.
On Tue, Jan 1, 2019 at 9:31 PM jbrockmendel ***@***.***>
wrote:
> So it looks like the packers tests are resolved if I change
> DatetimeArray.__init__ to treat M8[ns] values like int64 values, which
> I'd really rather not do. Any thoughts on another way to resolve this?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#24540 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQHImc6epNMD2IN_jhpbxdqwfZL5K-Aks5u_CgngaJpZM4ZmEUK>
> .
>
|
Making progress I think, but giving up for the night.
On Tue, Jan 1, 2019 at 9:36 PM Tom Augspurger <[email protected]>
wrote:
… A `pd.date_range` apparently calls `DatetimeArray.__init__` four times,
which I hope isn't necessary.
On Tue, Jan 1, 2019 at 9:35 PM Tom Augspurger ***@***.***>
wrote:
> I'm looking into this right now. I'd going to try to crack it tonight,
> but I'm pretty confused at the moment. I'll post if I give up for the night.
>
> On Tue, Jan 1, 2019 at 9:31 PM jbrockmendel ***@***.***>
> wrote:
>
>> So it looks like the packers tests are resolved if I change
>> DatetimeArray.__init__ to treat M8[ns] values like int64 values, which
>> I'd really rather not do. Any thoughts on another way to resolve this?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#24540 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/ABQHImc6epNMD2IN_jhpbxdqwfZL5K-Aks5u_CgngaJpZM4ZmEUK>
>> .
>>
>
|
That hasn't been implemented yet.
…On Wed, Jan 2, 2019 at 8:59 AM jbrockmendel ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/arrays/datetimes.py
<#24540 (comment)>:
> @@ -1568,6 +1573,8 @@ def sequence_to_dt64ns(data, dtype=None, copy=False,
copy = False
elif isinstance(data, ABCSeries):
data = data._values
+ elif isinstance(data, ABCPandasArray):
i seem to recall something about extension dtypes being registered in such
a way that infer_dtype could pick up on them. will take a look
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIq87X1_H0AnFsk70-P372E_o3Bapks5u_MlWgaJpZM4ZmEUK>
.
|
What is currently the status of this PR in relation to #24024 ? |
Many of the comments here need to be addressed in 24024 as well, but on its own this PR is dead in the water since I can't get the packers tests passing. |
Sounds like this will be a decent followup then, since the packers tests
will be fixed.
…On Wed, Jan 2, 2019 at 11:02 AM jbrockmendel ***@***.***> wrote:
What is currently the status of this PR in relation to #24024
<#24024> ?
Many of the comments here need to be addressed in 24024 as well, but on
its own this PR is dead in the water since I can't get the packers tests
passing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIm6oL0NsQ9wIobpik4QQH9CL7cHqks5u_OZBgaJpZM4ZmEUK>
.
|
@jbrockmendel were they addressed in the end? |
I'll double-check. Still have a browser tab open trying to keep track of which issues from 24024 are still open. Will see if one of the follow-up Issues can serve as a tracker. |
cc @jreback @TomAugspurger
changes _holder from DTI/TDI to DTA/TDA, pretty much all the diff from blocks.internals and core.dtypes
Not yet green locally, but I'm presenting this as a viable alternative to the all-at-once still in #24024