Skip to content

Make _freq/freq/tz/_tz/dtype/_dtype/offset/_offset all inherit reliably #24517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 1, 2019

Conversation

jbrockmendel
Copy link
Member

There was an issue in rebasing #24024 in which index._freq didn't match index._eadata._freq. This ensures that is never an issue by removing _freq/freq from the Index subclasses and making them properties aliasing the _eadata attrs.

To do this, we have to make _eadata not-a-property for the time being.

Also make the _tz --> _dtype transition in #24024, and fix a couple of places in DatetimeIndex where it sets _tz manually that would otherwise be missed.

Remove PeriodIndex.shift, since it now can use the DatetimeIndexOpsMixin version (also done in 24024)

@codecov
Copy link

codecov bot commented Dec 31, 2018

Codecov Report

Merging #24517 into master will increase coverage by <.01%.
The diff coverage is 58.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24517      +/-   ##
==========================================
+ Coverage   31.88%   31.89%   +<.01%     
==========================================
  Files         166      166              
  Lines       52434    52445      +11     
==========================================
+ Hits        16718    16726       +8     
- Misses      35716    35719       +3
Flag Coverage Δ
#multiple 30.29% <58.06%> (ø) ⬆️
#single 31.89% <58.06%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 38.23% <0%> (ø) ⬆️
pandas/core/indexes/period.py 31.49% <100%> (-0.02%) ⬇️
pandas/core/arrays/datetimes.py 62.24% <100%> (+0.26%) ⬆️
pandas/core/indexes/datetimelike.py 40.05% <25%> (-0.36%) ⬇️
pandas/core/indexes/datetimes.py 37.44% <46.66%> (+0.1%) ⬆️
pandas/core/indexes/timedeltas.py 43.28% <81.81%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf9d41c...288255c. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 31, 2018

Codecov Report

Merging #24517 into master will increase coverage by <.01%.
The diff coverage is 72.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24517      +/-   ##
==========================================
+ Coverage   31.88%   31.88%   +<.01%     
==========================================
  Files         166      166              
  Lines       52434    52412      -22     
==========================================
- Hits        16718    16714       -4     
+ Misses      35716    35698      -18
Flag Coverage Δ
#multiple 30.29% <72.72%> (ø) ⬆️
#single 31.88% <72.72%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 38.36% <ø> (+0.13%) ⬆️
pandas/core/indexes/period.py 31.4% <100%> (-0.1%) ⬇️
pandas/core/arrays/datetimes.py 62.28% <100%> (+0.3%) ⬆️
pandas/core/indexes/timedeltas.py 42.57% <100%> (-0.29%) ⬇️
pandas/core/indexes/datetimes.py 36.95% <42.85%> (-0.39%) ⬇️
pandas/core/indexes/datetimelike.py 41.14% <83.33%> (+0.73%) ⬆️
pandas/core/generic.py 31.41% <0%> (ø) ⬆️
pandas/util/testing.py 37.59% <0%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf9d41c...1456b6a. Read the comment docs.

@jbrockmendel jbrockmendel mentioned this pull request Dec 31, 2018
12 tasks
result.name = name
# For groupby perf. See note in indexes/base about _index_data
# TODO: make sure this is updated correctly if edited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is for _index_data? In theory that shouldn't happen, since DatetimeIndex is immutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _libs.reduction there is a line:

object.__setattr__(cached_ityp, '_index_data', islider.buf)

which makes me wary. Is this just never relevant for DTI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the intent of all this, to directly mutate the buffer in place. The .reset method undoes all this stuff. Nobody else should be messing with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I think all your other comments have been addressed, not sure about this one. Should this TODO comment be removed? Some other action taken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this todo isn't necessary AFAICT.

Also, I really don't think we should have inverted the relationship between the eadata and data attributes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because that change isn't going to last through #24024, so I think it was unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I really don't think we should have inverted the relationship between the eadata and data attributes. [...] so I think it was unnecessary.

We can't have both 1) _eadata being a property that depends on self.freq and 2) freq be a property that depends on _eadata.

Definitely agree that 24024 should return things to the old pattern.

def _eadata(self):
return DatetimeArray._simple_new(self._data,
tz=self.tz, freq=self.freq)
def _data(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that _data is a property anywhere else. Just an attribute that's set in _simple_new.

Actually... how does this even work? If you don't have a setter (which I don't see) then simple_new should fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see you set eadata.

It doesn't really matter since we're removing it soon anyway, but I'd prefer that _eadata always reference _data. That makes keeps all the index classes consistent that ._data is the actual array, not a property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that _data is a property anywhere else. Just an attribute that's set in _simple_new.

That's right. In this PR we set _eadata inside _simple_new and make _data a property returning _eadata._data. The freq/tz passthrough doesn't work with _eadata as a property (as in master), so the only question is whether to also set _data in _simple_new. I chose to make it a property to prevent any shenanigans where they become untied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think https://github.com/pandas-dev/pandas/pull/24024/files#diff-26a6d2ca7adfca586aabbb1c9dd8bf36R74 is what we want for eadata & freq (and if we can do it here, instead of that PR, then that's best).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The freq/tz passthrough doesn't work with _eadata as a property (as in master),

Why's that? I suppose I could see why setting doesn't work, since IIUC we create a new DateteimArray on each invocation of _eadata.

Should we just hold off on these changes til #24024 then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course it won't work, since we call .freq when creating the _eadata instance

@TomAugspurger TomAugspurger added Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 31, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 31, 2018
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 31, 2018

While you're at it, could you add

  1. Add DatetimeIndexOpsMixin._ndarray_values as
@property
def _ndarray_values(self):
    return self._eadata._ndarray_values
  1. Remove PeriodIndex._ndarray_values

@@ -231,8 +231,13 @@ def _simple_new(cls, values, freq=None, tz=None):
result = object.__new__(cls)
result._data = values
result._freq = freq
tz = timezones.maybe_get_tz(tz)
result._tz = timezones.tz_standardize(tz)
if tz is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it cheap to detect when you need to call maybe_get_tz & tz_standarize? ideally these could be called in the DatetimeTZDtype constructor / alternatively could have a dedicate constructor,

DateteimeTZDtype.construct_from_tztype, and will pave the way for a unified dtype as well.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small comment on future directions about dtypes.

@jreback
Copy link
Contributor

jreback commented Dec 31, 2018

ping on green.

@jbrockmendel
Copy link
Member Author

ping

@jreback jreback merged commit 50470d5 into pandas-dev:master Jan 1, 2019
@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

thanks!

@jbrockmendel jbrockmendel deleted the eadata2 branch January 1, 2019 01:17
"""
# GH 18595
return self._tz
return getattr(self._dtype, "tz", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this uses the private attribute instead of the dtype property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, feel free to change

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants