Skip to content

REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for updating the PR.

@codecov
Copy link

codecov bot commented Oct 31, 2018

Codecov Report

Merging #23426 into master will increase coverage by <.01%.
The diff coverage is 95.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23426      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         161      161              
  Lines       51192    51220      +28     
==========================================
+ Hits        47197    47223      +26     
- Misses       3995     3997       +2
Flag Coverage Δ
#multiple 90.63% <95.58%> (ø) ⬆️
#single 42.24% <58.82%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 94.04% <100%> (+0.03%) ⬆️
pandas/core/indexes/datetimelike.py 98.05% <100%> (+0.03%) ⬆️
pandas/core/indexes/datetimes.py 96.3% <100%> (-0.14%) ⬇️
pandas/core/arrays/timedeltas.py 94.21% <87.5%> (-0.09%) ⬇️
pandas/core/indexes/timedeltas.py 90.71% <93.75%> (+0.09%) ⬆️
pandas/core/arrays/datetimes.py 98.62% <96.29%> (+0.71%) ⬆️
pandas/io/feather_format.py 77.14% <0%> (-8.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7191af9...777ddff. Read the comment docs.

@gfyoung gfyoung added API Design Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 31, 2018
@gfyoung
Copy link
Member

gfyoung commented Oct 31, 2018

@jbrockmendel : I see two different types of changes in a single PR (as your title also indicates). Is there a reason they have to be together instead of in separate PR's (for reviewing purposes)?

@@ -1320,6 +1318,20 @@ def to_julian_date(self):


def _generate_regular_range(cls, start, end, periods, freq):
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function summary / description?

start : Timestamp or None
end : Timestamp or None
periods : int
freq : DateOffset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter descriptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

assert isinstance(values, np.ndarray), type(values)
if values.dtype == 'i8':
values = values.view('m8[ns]')
assert values.dtype == 'm8[ns]', values.dtype
Copy link
Member

@gfyoung gfyoung Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  • Generally not a big fan of bare assert like these, unless they're internal (in which case that might be fine). Are these user-facing in any way?
  • Even if they're internal are these assert statements tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely these got added for debugging and left them in since they behave like especially-emphatic comments. On the next pass I'll make sure that they only go in private methods.

@@ -209,6 +204,15 @@ def __new__(cls, values, freq=None, tz=None, dtype=None):
# if dtype has an embedded tz, capture it
tz = dtl.validate_tz_from_dtype(dtype, tz)

if isinstance(values, DatetimeArrayMixin):
values = values.asi8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why getting the integers here and not M8 directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the way the constructors view i8 values is more consistent. For tz-naive, M8 vs i8 are equivalent. For tz-aware, i8 is interpreted as unix timestamps (i.e. UTC), whereas M8 are interpreted as the wall-time in the given timezone.

values = values.view('M8[ns]')

assert isinstance(values, np.ndarray), type(values)
assert is_datetime64_dtype(values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above you check explicitly for 'M8[ns]', because here it can still be another resolution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

end, getattr(end, 'tz', None), end, freq, tz
)
start = _maybe_localize_point(start, getattr(start, 'tz', None),
start, freq, tz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, can you leave such style changes only to lines you actually change anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


# unwrap for case where e.g. _get_unique_index passes an instance
# of own class instead of ndarray
values = getattr(values, '_data', values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this with an actual check that it is an index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. (Ideally I'd like to make _get_unique_index pass the "correct" thing, but that can wait for another day)

@@ -1134,6 +1136,8 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end)
is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year)

tz_localize = wrap_array_method(DatetimeArrayMixin.tz_localize, True)
tz_convert = wrap_array_method(DatetimeArrayMixin.tz_convert, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods previously used shallow_copy in the DatetimeArray class, so name was inherited automatically. This PR avoids the use of shallow_copy in the DatetimeArray class, so we need the extra step to pin name.


# `dtype` is not always passed, but if it is, it should always
# be m8[ns]
assert dtype == _TD_DTYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases is it passed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _shallow_copy:

        if not len(values) and 'dtype' not in kwargs:
            attributes['dtype'] = self.dtype

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR doesn't clearly change the needle. you are adding some more usages of shallow_copy and removing others. where is the simplification here? maybe its just that you have lots of extraneous changes. pls do them separately.

@@ -177,16 +177,11 @@ def _simple_new(cls, values, freq=None, tz=None, **kwargs):
we require the we have a dtype compat for the values
if we are passed a non-dtype compat, then coerce using the constructor
"""
assert isinstance(values, np.ndarray), type(values)
if values.dtype == 'i8':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here about what this is doing

end, getattr(end, 'tz', None), end, freq, tz
)
start = _maybe_localize_point(start, getattr(start, 'tz', None),
start, freq, tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping

ensure_int64(index.values),
tz, ambiguous=ambiguous)
if tz is not None and index.tz is None:
arr = conversion.tz_localize_to_utc(ensure_int64(index.values),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap on the ensure_int64


data = cls._simple_new(data.view(_NS_DTYPE), freq=freq, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its pretty arbitrary that you are viewing as M8[ns] rather than i8 here. let's be consistent (prob just i8 is fine),
though I think this IS i8 already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, there is a lot of casting back-and-forth. I'll try to cut down on it.

General thought is that the values passed to _simple_new should already be in their correct forms (and master currently has some weird behavior, like passing a list in one case). Sharing code between Datetime/Timedelta/Period pretty much requires that an exception be made for i8.

else:
values = ensure_int64(values).view(_TD_DTYPE)
def _simple_new(cls, values, freq=None):
assert isinstance(values, np.ndarray), type(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add doc-string to indicate possible things values can be (e.g. i8, M8[ns])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is object even possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring: sure

object: yes. The point of this PR is to be more strict about what goes to _simple_new, since too much work is going on in some of them, causing some ambiguity.

As commented elsewhere, this PR does more things that it needs to, causing some confusion. I'll separate them out.

@jbrockmendel
Copy link
Member Author

you are adding some more usages of shallow_copy and removing others

I'll double-check, but the intent was to only remove shallow_copy, particularly in the EA mixins.

where is the simplification here?

main simplification is in being strict/clear about what goes in to _simple_new (though the "clear" part evidently was not so successful).

Will update/split.

@jbrockmendel
Copy link
Member Author

Closing in favor of #23430, #23431, #23433.

@jbrockmendel jbrockmendel deleted the simple branch October 31, 2018 19:39
@gfyoung gfyoung added this to the No action milestone Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants