REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

jbrockmendel · 2018-10-31T02:07:57Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…mple

pep8speaks · 2018-10-31T03:52:22Z

Hello @jbrockmendel! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimelike.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/core/indexes/timedeltas.py !

codecov · 2018-10-31T04:34:32Z

Codecov Report

Merging #23426 into master will increase coverage by <.01%.
The diff coverage is 95.58%.

@@            Coverage Diff             @@
##           master   #23426      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         161      161              
  Lines       51192    51220      +28     
==========================================
+ Hits        47197    47223      +26     
- Misses       3995     3997       +2

Flag	Coverage Δ
#multiple	`90.63% <95.58%> (ø)`	⬆️
#single	`42.24% <58.82%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimelike.py	`94.04% <100%> (+0.03%)`	⬆️
pandas/core/indexes/datetimelike.py	`98.05% <100%> (+0.03%)`	⬆️
pandas/core/indexes/datetimes.py	`96.3% <100%> (-0.14%)`	⬇️
pandas/core/arrays/timedeltas.py	`94.21% <87.5%> (-0.09%)`	⬇️
pandas/core/indexes/timedeltas.py	`90.71% <93.75%> (+0.09%)`	⬆️
pandas/core/arrays/datetimes.py	`98.62% <96.29%> (+0.71%)`	⬆️
pandas/io/feather_format.py	`77.14% <0%> (-8.58%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7191af9...777ddff. Read the comment docs.

gfyoung · 2018-10-31T04:50:18Z

@jbrockmendel : I see two different types of changes in a single PR (as your title also indicates). Is there a reason they have to be together instead of in separate PR's (for reviewing purposes)?

gfyoung · 2018-10-31T04:50:41Z

pandas/core/arrays/datetimes.py

@@ -1320,6 +1318,20 @@ def to_julian_date(self):


 def _generate_regular_range(cls, start, end, periods, freq):
+    """
+


Function summary / description?

gfyoung · 2018-10-31T04:50:54Z

pandas/core/arrays/datetimes.py

+    start : Timestamp or None
+    end : Timestamp or None
+    periods : int
+    freq : DateOffset


Parameter descriptions?

gfyoung · 2018-10-31T04:52:13Z

pandas/core/indexes/timedeltas.py

+        assert isinstance(values, np.ndarray), type(values)
+        if values.dtype == 'i8':
+            values = values.view('m8[ns]')
+        assert values.dtype == 'm8[ns]', values.dtype


Two points:

Generally not a big fan of bare assert like these, unless they're internal (in which case that might be fine). Are these user-facing in any way?

Even if they're internal are these assert statements tested?

Largely these got added for debugging and left them in since they behave like especially-emphatic comments. On the next pass I'll make sure that they only go in private methods.

jorisvandenbossche · 2018-10-31T08:15:05Z

pandas/core/arrays/datetimes.py

@@ -209,6 +204,15 @@ def __new__(cls, values, freq=None, tz=None, dtype=None):
        # if dtype has an embedded tz, capture it
        tz = dtl.validate_tz_from_dtype(dtype, tz)

+        if isinstance(values, DatetimeArrayMixin):
+            values = values.asi8


why getting the integers here and not M8 directly?

Because the way the constructors view i8 values is more consistent. For tz-naive, M8 vs i8 are equivalent. For tz-aware, i8 is interpreted as unix timestamps (i.e. UTC), whereas M8 are interpreted as the wall-time in the given timezone.

jorisvandenbossche · 2018-10-31T08:16:57Z

pandas/core/arrays/datetimes.py

+            values = values.view('M8[ns]')
+
+        assert isinstance(values, np.ndarray), type(values)
+        assert is_datetime64_dtype(values)


above you check explicitly for 'M8[ns]', because here it can still be another resolution?

jorisvandenbossche · 2018-10-31T08:21:25Z

pandas/core/arrays/datetimes.py

-                end, getattr(end, 'tz', None), end, freq, tz
-            )
+            start = _maybe_localize_point(start, getattr(start, 'tz', None),
+                                          start, freq, tz)


In general, can you leave such style changes only to lines you actually change anyway?

yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping

jorisvandenbossche · 2018-10-31T08:24:58Z

pandas/core/indexes/datetimelike.py

+
+        # unwrap for case where e.g. _get_unique_index passes an instance
+        #  of own class instead of ndarray
+        values = getattr(values, '_data', values)


Can you do this with an actual check that it is an index?

Sure. (Ideally I'd like to make _get_unique_index pass the "correct" thing, but that can wait for another day)

jorisvandenbossche · 2018-10-31T08:26:37Z

pandas/core/indexes/datetimes.py

@@ -1134,6 +1136,8 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
    is_year_end = wrap_field_accessor(DatetimeArrayMixin.is_year_end)
    is_leap_year = wrap_field_accessor(DatetimeArrayMixin.is_leap_year)

+    tz_localize = wrap_array_method(DatetimeArrayMixin.tz_localize, True)
+    tz_convert = wrap_array_method(DatetimeArrayMixin.tz_convert, True)


Can you explain this change?

These two methods previously used shallow_copy in the DatetimeArray class, so name was inherited automatically. This PR avoids the use of shallow_copy in the DatetimeArray class, so we need the extra step to pin name.

jorisvandenbossche · 2018-10-31T08:29:01Z

pandas/core/indexes/timedeltas.py

+
+        # `dtype` is not always passed, but if it is, it should always
+        #  be m8[ns]
+        assert dtype == _TD_DTYPE


In what cases is it passed?

In _shallow_copy:

if not len(values) and 'dtype' not in kwargs: attributes['dtype'] = self.dtype

jreback

this PR doesn't clearly change the needle. you are adding some more usages of shallow_copy and removing others. where is the simplification here? maybe its just that you have lots of extraneous changes. pls do them separately.

jreback · 2018-10-31T11:50:35Z

pandas/core/arrays/datetimes.py

@@ -177,16 +177,11 @@ def _simple_new(cls, values, freq=None, tz=None, **kwargs):
        we require the we have a dtype compat for the values
        if we are passed a non-dtype compat, then coerce using the constructor
        """
+        assert isinstance(values, np.ndarray), type(values)
+        if values.dtype == 'i8':


add a comment here about what this is doing

jreback · 2018-10-31T11:51:16Z

pandas/core/arrays/datetimes.py

-                end, getattr(end, 'tz', None), end, freq, tz
-            )
+            start = _maybe_localize_point(start, getattr(start, 'tz', None),
+                                          start, freq, tz)


yeah and the former is actually more idiomatic, really prefer not to do partial line wrapping

jreback · 2018-10-31T11:52:04Z

pandas/core/arrays/datetimes.py

-                    ensure_int64(index.values),
-                    tz, ambiguous=ambiguous)
+            if tz is not None and index.tz is None:
+                arr = conversion.tz_localize_to_utc(ensure_int64(index.values),


wrap on the ensure_int64

jreback · 2018-10-31T11:53:37Z

pandas/core/arrays/datetimes.py


+    data = cls._simple_new(data.view(_NS_DTYPE), freq=freq, tz=tz)


its pretty arbitrary that you are viewing as M8[ns] rather than i8 here. let's be consistent (prob just i8 is fine),
though I think this IS i8 already?

Yah, there is a lot of casting back-and-forth. I'll try to cut down on it.

General thought is that the values passed to _simple_new should already be in their correct forms (and master currently has some weird behavior, like passing a list in one case). Sharing code between Datetime/Timedelta/Period pretty much requires that an exception be made for i8.

jreback · 2018-10-31T11:54:00Z

pandas/core/arrays/timedeltas.py

-            else:
-                values = ensure_int64(values).view(_TD_DTYPE)
+    def _simple_new(cls, values, freq=None):
+        assert isinstance(values, np.ndarray), type(values)


can you add doc-string to indicate possible things values can be (e.g. i8, M8[ns])

is object even possible?

docstring: sure

object: yes. The point of this PR is to be more strict about what goes to _simple_new, since too much work is going on in some of them, causing some ambiguity.

As commented elsewhere, this PR does more things that it needs to, causing some confusion. I'll separate them out.

jbrockmendel · 2018-10-31T15:44:14Z

you are adding some more usages of shallow_copy and removing others

I'll double-check, but the intent was to only remove shallow_copy, particularly in the EA mixins.

where is the simplification here?

main simplification is in being strict/clear about what goes in to _simple_new (though the "clear" part evidently was not so successful).

Will update/split.

jbrockmendel · 2018-10-31T19:38:58Z

Closing in favor of #23430, #23431, #23433.

jbrockmendel added 7 commits October 30, 2018 17:17

make simple_new stricter, avoid use of shallow_copy

10a923f

Avoid use of shallow_copy

9060f1a

make dtype explicit

d5b0bfd

cosmetics

233367a

docstring and simplication for generate_regular_range

f37ace3

Merge branch 'master' of https://github.com/pandas-dev/pandas into si…

1860ea0

…mple

flake8 fixups

777ddff

gfyoung added API Design Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 31, 2018

gfyoung reviewed Oct 31, 2018

View reviewed changes

jorisvandenbossche reviewed Oct 31, 2018

View reviewed changes

jreback requested changes Oct 31, 2018

View reviewed changes

This was referenced Oct 31, 2018

REF: Remove DatetimelikeArrayMixin._shallow_copy #23430

Merged

REF: strictness/simplification in DatetimeArray/Index _simple_new #23431

Merged

REF: strictness and checks for Timedelta _simple_new #23433

Merged

jbrockmendel closed this Oct 31, 2018

jbrockmendel deleted the simple branch October 31, 2018 19:39

gfyoung added this to the No action milestone Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

jbrockmendel commented Oct 31, 2018

pep8speaks commented Oct 31, 2018

codecov bot commented Oct 31, 2018 •

edited

Loading

gfyoung commented Oct 31, 2018

gfyoung Oct 31, 2018

gfyoung Oct 31, 2018

jbrockmendel Oct 31, 2018

gfyoung Oct 31, 2018 •

edited

Loading

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jreback Oct 31, 2018

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jbrockmendel Oct 31, 2018

jorisvandenbossche Oct 31, 2018

jbrockmendel Oct 31, 2018

jreback left a comment

jreback Oct 31, 2018

jreback Oct 31, 2018

jreback Oct 31, 2018

jreback Oct 31, 2018

jbrockmendel Oct 31, 2018

jreback Oct 31, 2018

jreback Oct 31, 2018

jbrockmendel Oct 31, 2018

jbrockmendel commented Oct 31, 2018

jbrockmendel commented Oct 31, 2018

		@@ -1320,6 +1318,20 @@ def to_julian_date(self):


		def _generate_regular_range(cls, start, end, periods, freq):
		"""


		data = cls._simple_new(data.view(_NS_DTYPE), freq=freq, tz=tz)

REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

REF: stricter checks in _simple_new, avoid shallow_copy in EAs #23426

Conversation

jbrockmendel commented Oct 31, 2018

pep8speaks commented Oct 31, 2018

codecov bot commented Oct 31, 2018 • edited Loading

Codecov Report

gfyoung commented Oct 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Oct 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 31, 2018

jbrockmendel commented Oct 31, 2018

codecov bot commented Oct 31, 2018 •

edited

Loading

gfyoung Oct 31, 2018 •

edited

Loading