Skip to content

REF: use asi8 instead of _data in PeriodArray #23416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jbrockmendel
Copy link
Member

@TomAugspurger @jorisvandenbossche discussed on the call a couple weeks ago. Reference asi8 instead of _data or _ndarray_values is the most-explicit option available.

I would also be in favor of getting rid of _data directly and setting asi8 in __init__, but holding off on that pending consensus.

Also re-implemented some simplifications that got lost in rebasing during the PeriodArray marathon.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

elif not is_integer(periods):
msg = 'periods must be a number, got {periods}'
raise TypeError(msg.format(periods=periods))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done within PeriodArray._generate_range

@sinhrks sinhrks added Period Period data type Clean labels Oct 30, 2018
@codecov
Copy link

codecov bot commented Oct 30, 2018

Codecov Report

Merging #23416 into master will not change coverage.
The diff coverage is 92.3%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23416   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         161      161           
  Lines       51173    51173           
=======================================
  Hits        47190    47190           
  Misses       3983     3983
Flag Coverage Δ
#multiple 90.65% <92.3%> (ø) ⬆️
#single 42.23% <15.38%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/period.py 97.48% <92.3%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de39bfc...a40509d. Read the comment docs.

@jorisvandenbossche
Copy link
Member

What I remember from the call is that we said that asi8 would be strange as the 'base' data attribute as we could not share that then with DatetimeArray (because for DatetimeArray that name would be wrong). Therefore I am slightly -1 on this.

For example, the nbytes property can be moved to a base / mixin class for DatetimeArray and PeriodArray, as both can have the same implementation (and the same will be true for some other properties/method)

@jreback
Copy link
Contributor

jreback commented Oct 30, 2018

I actually like _data as it very consistent across all array-like things. pls split out other changes to another PR if you would.

@jbrockmendel
Copy link
Member Author

What I remember from the call is that we said that asi8 would be strange as the 'base' data attribute as we could not share that then with DatetimeArray (because for DatetimeArray that name would be wrong)

The audio issues on the call must have been much worse than we realized, since I don't recall that at all.

It wouldn't make sense to use asi8 as the name used in DatetimeArray.__init__, but (recall we discussed the fact that) nearly all of the ops in all three datetimelike array classes operate on the i8 values.

For example, the nbytes property can be moved to a base / mixin class for DatetimeArray and PeriodArray, as both can have the same implementation (and the same will be true for some other properties/method)

This would, in fact, also work fine returning self.asi8.nbytes in such a base/mixin class.

I actually like _data as it very consistent across all array-like things. pls split out other changes to another PR if you would.

I can split out the other stuff, but am not ready to give up on the central point here:

The ongoing trouble with values/_values/_ndarray_values/_data/... is driven by ambiguity. When I see one of these used, it isn't immediately obvious what it means. This PR changes usages of the maximally-ambiguous name to the maximally-obvious name.

@jorisvandenbossche
Copy link
Member

It wouldn't make sense to use asi8 as the name used in DatetimeArray.init,

But I think it would still make sense to use the same attribute name for all datetimelike arrays, and in basic functionality (like __len__) it would be the most clear IMO to simply use the attribute set in the init.

I am fine with another name as _data, if there is a good alternative. asi8 is not common for all datetime-like arrays, and for _ndarray_values I gave some reasoning here why that is not a good choice IMO: #22862 (comment).
We can still do something like _np_data, or other variations, if we really want.

but (recall we discussed the fact that) nearly all of the ops in all three datetimelike array classes operate on the i8 values.

Yes, and for the ops in the PerioArray code, it uses asi8 and not _data, which is fine I think.

This would, in fact, also work fine returning self.asi8.nbytes in such a base/mixin class.

That would work, but I don't think it would be nicer or clearer design than using the 'base' data attribute.

@jbrockmendel
Copy link
Member Author

As you mentioned in the linked comment, _ndarray_values is already overloaded; my point is that _data is equally overloaded. Three options come to mind:

  1. Accept that we are not likely to find a good name for an attribute to set in __init__ that works for all our EA subclasses. Instead, try to use informative names on a case-by-case basis. One upside to this is it avoids creating "shadow interface" where we get used to EA classes having certain behavior despite not actually requiring it.

  2. Standardize on a name like _initvals or _attrvals that conveys no information other than "this is set in __init__ and is a plain attribute-lookup". This has the advantage of being very clear. (Also relevant: there has also been discussion elsewhere of wanting a name that guarantees fast/no-copy access.)

  3. This PR doesn't try to change the name set in __init__, so that part of the issue could be kicked down the road.

What this PR does do is replace usages of a highly-overloaded name with usages of an entirely-clear name. If later we need to tinker with it in order to share code with other Datetimelike EAs, I'm fine with that.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

Standardize on a name like _initvals or _attrvals that conveys no information other than "this is set in init and is a plain attribute-lookup". This has the advantage of being very clear. (Also relevant: there has also been discussion elsewhere of wanting a name that guarantees fast/no-copy access.)

I am not sure how this is different from _data.

@jbrockmendel
Copy link
Member Author

Not a priority, closing.

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
@jbrockmendel jbrockmendel deleted the noshallow branch April 5, 2020 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants