-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for updating the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23426 +/- ##
==========================================
+ Coverage 92.19% 92.19% +<.01%
==========================================
Files 161 161
Lines 51192 51220 +28
==========================================
+ Hits 47197 47223 +26
- Misses 3995 3997 +2
Continue to review full report at Codecov.
|
@jbrockmendel : I see two different types of changes in a single PR (as your title also indicates). Is there a reason they have to be together instead of in separate PR's (for reviewing purposes)? |
@@ -1320,6 +1318,20 @@ def to_julian_date(self): | |||
|
|||
|
|||
def _generate_regular_range(cls, start, end, periods, freq): | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function summary / description?
start : Timestamp or None | ||
end : Timestamp or None | ||
periods : int | ||
freq : DateOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter descriptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
assert isinstance(values, np.ndarray), type(values) | ||
if values.dtype == 'i8': | ||
values = values.view('m8[ns]') | ||
assert values.dtype == 'm8[ns]', values.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points:
- Generally not a big fan of bare
assert
like these, unless they're internal (in which case that might be fine). Are these user-facing in any way? - Even if they're internal are these
assert
statements tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely these got added for debugging and left them in since they behave like especially-emphatic comments. On the next pass I'll make sure that they only go in private methods.
@@ -209,6 +204,15 @@ def __new__(cls, values, freq=None, tz=None, dtype=None): | |||
# if dtype has an embedded tz, capture it | |||
tz = dtl.validate_tz_from_dtype(dtype, tz) | |||
|
|||
if isinstance(values, DatetimeArrayMixin): | |||
values = values.asi8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why getting the integers here and not M8 directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the way the constructors view i8 values is more consistent. For tz-naive, M8 vs i8 are equivalent. For tz-aware, i8 is interpreted as unix timestamps (i.e. UTC), whereas M8 are interpreted as the wall-time in the given timezone.
values = values.view('M8[ns]') | ||
|
||
assert isinstance(values, np.ndarray), type(values) | ||
assert is_datetime64_dtype(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above you check explicitly for 'M8[ns]'
, because here it can still be another resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
end, getattr(end, 'tz', None), end, freq, tz | ||
) | ||
start = _maybe_localize_point(start, getattr(start, 'tz', None), | ||
start, freq, tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, can you leave such style changes only to lines you actually change anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
|
||
# unwrap for case where e.g. _get_unique_index passes an instance | ||
# of own class instead of ndarray | ||
values = getattr(values, '_data', values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this with an actual check that it is an index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. (Ideally I'd like to make _get_unique_index pass the "correct" thing, but that can wait for another day)
@@ -1134,6 +1136,8 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None): | |||
is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end) | |||
is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year) | |||
|
|||
tz_localize = wrap_array_method(DatetimeArrayMixin.tz_localize, True) | |||
tz_convert = wrap_array_method(DatetimeArrayMixin.tz_convert, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two methods previously used shallow_copy in the DatetimeArray class, so name
was inherited automatically. This PR avoids the use of shallow_copy in the DatetimeArray class, so we need the extra step to pin name
.
|
||
# `dtype` is not always passed, but if it is, it should always | ||
# be m8[ns] | ||
assert dtype == _TD_DTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases is it passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _shallow_copy:
if not len(values) and 'dtype' not in kwargs:
attributes['dtype'] = self.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR doesn't clearly change the needle. you are adding some more usages of shallow_copy and removing others. where is the simplification here? maybe its just that you have lots of extraneous changes. pls do them separately.
@@ -177,16 +177,11 @@ def _simple_new(cls, values, freq=None, tz=None, **kwargs): | |||
we require the we have a dtype compat for the values | |||
if we are passed a non-dtype compat, then coerce using the constructor | |||
""" | |||
assert isinstance(values, np.ndarray), type(values) | |||
if values.dtype == 'i8': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here about what this is doing
end, getattr(end, 'tz', None), end, freq, tz | ||
) | ||
start = _maybe_localize_point(start, getattr(start, 'tz', None), | ||
start, freq, tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping
ensure_int64(index.values), | ||
tz, ambiguous=ambiguous) | ||
if tz is not None and index.tz is None: | ||
arr = conversion.tz_localize_to_utc(ensure_int64(index.values), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap on the ensure_int64
|
||
data = cls._simple_new(data.view(_NS_DTYPE), freq=freq, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its pretty arbitrary that you are viewing as M8[ns] rather than i8 here. let's be consistent (prob just i8 is fine),
though I think this IS i8 already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, there is a lot of casting back-and-forth. I'll try to cut down on it.
General thought is that the values passed to _simple_new
should already be in their correct forms (and master currently has some weird behavior, like passing a list in one case). Sharing code between Datetime/Timedelta/Period pretty much requires that an exception be made for i8.
else: | ||
values = ensure_int64(values).view(_TD_DTYPE) | ||
def _simple_new(cls, values, freq=None): | ||
assert isinstance(values, np.ndarray), type(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add doc-string to indicate possible things values can be (e.g. i8, M8[ns])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is object even possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring: sure
object: yes. The point of this PR is to be more strict about what goes to _simple_new, since too much work is going on in some of them, causing some ambiguity.
As commented elsewhere, this PR does more things that it needs to, causing some confusion. I'll separate them out.
I'll double-check, but the intent was to only remove shallow_copy, particularly in the EA mixins.
main simplification is in being strict/clear about what goes in to _simple_new (though the "clear" part evidently was not so successful). Will update/split. |
git diff upstream/master -u -- "*.py" | flake8 --diff