CLN refactor core indexes #37582

MarcoGorelli · 2020-11-02T11:38:39Z

Some refactorings found by Sourcery https://sourcery.ai/

I've removed the ones of the kind

- if param:
-     var = a
- else:
-     var = b
+ var = a if param else b

jreback

lgtm one comment.

pandas/core/indexes/datetimes.py

pandas/core/indexes/base.py

pandas/core/indexes/datetimelike.py

ivanovmg

One comment, looks good to me.

pandas/core/indexes/base.py

jreback · 2020-11-02T22:35:15Z

pandas/core/indexes/datetimelike.py


+        # quick check
+        if len(self) and self.is_monotonic and i8[0] != iNaT:


can we put both of tehse outside the try/except (I think this is ok to do), see L298

…exes

jreback · 2020-11-04T23:46:43Z

can you rebase this again to make sure failures are not systemtic

…exes

MarcoGorelli · 2020-11-05T07:42:43Z

The problem is that

pytest pandas/tests/reductions/test_reductions.py::TestIndexReductions::test_minmax_period

gets to

        if self.hasnans:
            if skipna:
                min_stamp = self[~self._isnan].asi8.min()
            else:
                return self._na_value
        else:
            min_stamp = i8.min()
        try:
            return self._data._box_func(min_stamp)
        except ValueError:
            return self._na_value

and throws a ValueError on min_stamp = self[~self._isnan].asi8.min()

latest commit addresses that

…exes

pandas/core/indexes/datetimelike.py

…exes

jbrockmendel · 2020-11-08T17:49:15Z

pandas/core/indexes/datetimelike.py

-            return self._data._box_func(max_stamp)
-        except ValueError:
+
+        # quick check


we could just return self._data.max(...). Downside is that doesn't take advantage of caching

…exes

jreback · 2020-11-18T13:54:37Z

can you merge master ping on green

…exes

jbrockmendel · 2020-11-23T19:06:21Z

pandas/core/indexes/base.py

@@ -5847,7 +5834,7 @@ def trim_front(strings: List[str]) -> List[str]:
    Trims zeros and decimal points.
    """
    trimmed = strings
-    while len(strings) > 0 and all(x[0] == " " for x in trimmed):
+    while trimmed and all(x.startswith(" ") for x in trimmed):


this looks like it changes the logic by checking trimmed instead of strings?

perf difference between startswith vs x[0] == " "?

True, but AFAIKT this condition is only needed if trimmed is an empty list (because all([]) will always be True), so if we do an early return then we can remove it.

Regarding perf differences:

In [15]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'.startswith('f') 110 ns ± 3.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [16]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'[0] == 'f' 24.1 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

so I'll go back to [0] == ' ' (cc @ivanovmg )

pandas/core/indexes/datetimelike.py

jbrockmendel · 2020-11-23T19:08:46Z

pandas/core/indexes/datetimes.py

+                and (other.tz is None)
+                or (self.tz is None)
+                and (other.tz is not None)
+            ):


(self.tz is None) ^ (other.tz is None)

(self.tz is None ^ other.tz is not None)? AFAIKT the check is to see if one of them is None but the other one isn't - I've tried regrouping the parens to make it clearer anyway, thanks

was this not viable?

as in,

if (self.tz is None) or (other.tz is None): raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex")

?

I don't think that would work because we don't want to raise if both self.tz and other.tz are None, just if one is but the other isn't

Not (self.tz is None) or (other.tz is None), but (self.tz is None) ^ (other.tz is None)

ah sorry, I didn't actually know that was a Python command - that should work then, thanks so much!

jreback · 2020-11-24T13:46:23Z

@MarcoGorelli if you rebase and fix this up we can get into 1.2

…exes

MarcoGorelli · 2020-11-24T16:13:15Z

@MarcoGorelli if you rebase and fix this up we can get into 1.2

sure, have fixed conflicts and responded to review comments

…exes

jbrockmendel · 2020-12-08T17:13:13Z

pandas/core/indexes/base.py

@@ -3403,9 +3399,7 @@ def _convert_listlike_indexer(self, keyarr):
        keyarr : numpy.ndarray
            Return tuple-safe keys.
        """
-        if isinstance(keyarr, Index):
-            pass
-        else:


i think we did it this way to make coverage obvious

jbrockmendel · 2020-12-08T17:15:06Z

pandas/core/indexes/base.py

-    return trimmed
+    if not strings:
+        return strings
+    while all(x[0] == " " for x in strings):


is this going to break when one of the strings becomes empty?

Yes, you're absolutely right, thanks! It fails on master too, but still, worth fixing while we're modifying these lines

In [1]: from pandas.core.indexes.base import trim_front In [2]: trim_front([' ', ' a']) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-2-1d23255f182c> in <module> ----> 1 trim_front([' ', ' a']) ~/pandas-dev/pandas/core/indexes/base.py in trim_front(strings) 5848 """ 5849 trimmed = strings -> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed): 5851 trimmed = [x[1:] for x in trimmed] 5852 return trimmed ~/pandas-dev/pandas/core/indexes/base.py in <genexpr>(.0) 5848 """ 5849 trimmed = strings -> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed): 5851 trimmed = [x[1:] for x in trimmed] 5852 return trimmed IndexError: string index out of range

Looking at this again, I don't think it's an issue, because trim_front is only ever called with a list of strings which are all of the same length.

It's only ever called from pd.Index._format_with_header:

result = trim_front(format_array(values, None, justify="left"))

and format_array from pandas/io/formats/format.py returns

fmt_obj.get_result()

which in turn returns

_make_fixed_width(fmt_values, self.justify)

Nonetheless I can make the condition

while all(strings) and all(x[0] == " " for x in strings):

and add a tiny test for which that'd be necessary

…exes

…ndexes

jreback · 2020-12-22T14:14:25Z

can you rebase

…exes

jreback · 2020-12-22T21:06:24Z

thanks @MarcoGorelli

refactor core indexes

5685cb1

MarcoGorelli changed the title ~~refactor core indexes~~ CLN refactor core indexes Nov 2, 2020

jreback requested changes Nov 2, 2020

View reviewed changes

pandas/core/indexes/datetimes.py Outdated Show resolved Hide resolved

jreback added this to the 1.2 milestone Nov 2, 2020

jbrockmendel reviewed Nov 2, 2020

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 2, 2020

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 2, 2020

View reviewed changes

pandas/core/indexes/datetimelike.py Outdated Show resolved Hide resolved

MarcoGorelli added 2 commits November 2, 2020 14:58

parens

7695e66

reversions

d5d4530

ivanovmg reviewed Nov 2, 2020

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

MarcoGorelli added 2 commits November 2, 2020 19:31

make pythonic

23be6a5

try moving outside try

b3d1dec

jreback requested changes Nov 2, 2020

View reviewed changes

jreback added Index Related to the Index class or subclasses Code Style Code style, linting, code_checks labels Nov 2, 2020

MarcoGorelli added 2 commits November 4, 2020 18:30

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

1b0d92c

…exes

move out of try-except

e0d36eb

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

fb2bea0

…exes

MarcoGorelli added 2 commits November 5, 2020 07:53

keep asi8.min() inside try-except

26c419e

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

d076431

…exes

jreback requested changes Nov 8, 2020

View reviewed changes

pandas/core/indexes/datetimelike.py Outdated Show resolved Hide resolved

pandas/core/indexes/datetimelike.py Outdated Show resolved Hide resolved

MarcoGorelli added 3 commits November 8, 2020 08:03

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

bb0fc86

…exes

empty check

445f2e9

similar simplification in max

e8f117c

jbrockmendel reviewed Nov 8, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

f1f1deb

…exes

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

e860c45

…exes

MarcoGorelli marked this pull request as draft November 23, 2020 18:52

fix merge error

a58de56

jbrockmendel reviewed Nov 23, 2020

View reviewed changes

pandas/core/indexes/datetimelike.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 23, 2020

View reviewed changes

jreback removed this from the 1.2 milestone Nov 24, 2020

MarcoGorelli added 5 commits November 24, 2020 14:02

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

98672b1

…exes

wip

be80fb7

early return

dc0b84e

parens

4074a88

🎨

00d6be0

MarcoGorelli marked this pull request as ready for review November 24, 2020 16:00

MarcoGorelli added 2 commits November 29, 2020 15:16

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

d073da9

…exes

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

fdafd36

…exes

jbrockmendel reviewed Dec 8, 2020

View reviewed changes

MarcoGorelli added 3 commits December 9, 2020 18:20

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

01c4a06

…exes

coverage

1b254ae

🔀 Merge remote-tracking branch 'upstream/master' into refactor-core-i…

dffc440

…ndexes

Merge remote-tracking branch 'upstream/master' into refactor-core-ind…

63a8575

…exes

jreback added this to the 1.3 milestone Dec 22, 2020

jreback approved these changes Dec 22, 2020

View reviewed changes

jreback merged commit 75d02c7 into pandas-dev:master Dec 22, 2020

MarcoGorelli deleted the refactor-core-indexes branch December 23, 2020 08:06

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

CLN refactor core indexes (pandas-dev#37582)

b72976e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN refactor core indexes #37582

CLN refactor core indexes #37582

MarcoGorelli commented Nov 2, 2020

jreback left a comment

ivanovmg left a comment

jreback Nov 2, 2020

jreback commented Nov 4, 2020

MarcoGorelli commented Nov 5, 2020 •

edited

Loading

jbrockmendel Nov 8, 2020

jreback commented Nov 18, 2020

jbrockmendel Nov 23, 2020

MarcoGorelli Nov 24, 2020

jbrockmendel Nov 23, 2020

MarcoGorelli Nov 24, 2020

jbrockmendel Dec 8, 2020

MarcoGorelli Dec 8, 2020

jbrockmendel Dec 9, 2020

MarcoGorelli Dec 9, 2020

jreback commented Nov 24, 2020

MarcoGorelli commented Nov 24, 2020

jbrockmendel Dec 8, 2020

jbrockmendel Dec 8, 2020

MarcoGorelli Dec 8, 2020

MarcoGorelli Dec 9, 2020

jreback commented Dec 22, 2020

jreback commented Dec 22, 2020


		# quick check
		if len(self) and self.is_monotonic and i8[0] != iNaT:

CLN refactor core indexes #37582

CLN refactor core indexes #37582

Conversation

MarcoGorelli commented Nov 2, 2020

jreback left a comment

Choose a reason for hiding this comment

ivanovmg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 4, 2020

MarcoGorelli commented Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

jreback commented Nov 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 24, 2020

MarcoGorelli commented Nov 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 22, 2020

jreback commented Dec 22, 2020

MarcoGorelli commented Nov 5, 2020 •

edited

Loading