REF: implement _should_compare #38105

jbrockmendel · 2020-11-27T04:11:58Z

Index._is_comparable_dtype is limited because it misses a few cases: object dtype, and sub-dtypes within categorical. This implements Indes._should_compare, which handles those correctly. It then implements _get_indexer_non_comparable for cases in which we short-circuit non-comparable dtypes.

As a proof of concept, this then uses _should_compare and _get_indexer_non_comparable for PeriodIndex.get_indexer.

The behavior change, which this tests, is one that IIUC is a bug. That is, when we do get_indexer with a method and non-comparable dtypes, we should raise instead of return all minus-ones.

If implemented, we'll be able to use _should_compare to simplify all of the get_indexer, get_indexer_non_unique, and set op methods.

jbrockmendel · 2020-11-27T04:16:18Z

pandas/core/indexes/base.py

+            return no_matches
+        else:
+            # This is for get_indexer_non_unique
+            return no_matches, no_matches


@jreback theres some ambiguity as to what we're supposed to be returning here. ATM in Index.get_indexer_non_unique we are returning two ndarrays of minus-ones, but in IntervalIndex the second one is an np.arange(len(target))

The other inconsistency is that ATM PeriodIndex.get_indexer defines no_matches = -1 * np.ones(self.shape, dtype=np.intp) whereas i expected it to use target.shape

Can you clear this up for me?

looking at this more, i increasingly think the second one should be np.arange(target.shape, dtype=np.intp)

jreback

i don't think this method should raise.

this is a pretty big change from status quo

jbrockmendel · 2020-11-27T04:37:14Z

i don't think this method should raise.

this is a pretty big change from status quo

I stumbled onto this in part because of this test:

def test_reindex_datetimeindexes_tz_naive_and_aware():
    # GH 8306
    idx = date_range("20131101", tz="America/Chicago", periods=7)
    newidx = date_range("20131103", periods=10, freq="H")
    s = Series(range(7), index=idx)
    msg = "Cannot compare tz-naive and tz-aware timestamps"
    with pytest.raises(TypeError, match=msg):
        s.reindex(newidx, method="ffill")

and i figured that the way we treat periods with mismatched freq is almost identical to how we treat tzawareness-compat. Or is dt64tz special in this case?

I'd be OK with not-raising in _get_indexer_non_comparable, just want to get this behavior consistent, bc ATM we're all over the place.

jbrockmendel · 2020-11-27T04:39:34Z

pandas/tests/indexes/period/test_indexing.py

+                continue
+            # For object dtype we are liable to get a different exception message
+            with pytest.raises(TypeError):
+                pi.get_indexer(other2, method=method)


@jreback notice in these cases we are currently raising in master bc the scalar comparisons raise

is this new?

get_indexer is not super public but i believe it will never raise

I think you're right that get_indexer with method=None should never raise (maybe with tzawareness corner cases), but with method="ffill" the following raises on master:

dti = pd.date_range("2016-01-01", periods=3) rng = pd.Index(range(5)) >>> dti.get_indexer(rng, method="ffill") TypeError: '<' not supported between instances of 'int' and 'Timestamp'

jbrockmendel · 2020-11-27T19:01:54Z

xref #36320

…into ref-maybe_promote-2

…f-maybe_promote-2

jbrockmendel · 2020-11-28T21:19:14Z

So I'm increasingly convinced that the current PeriodIndex.get_indexer behavior with method != None is wrong:

dti = pd.date_range("2016-01-01", periods=3)
pi = dti.to_period("D")
pi2 = dti.to_period("W")

ser = pd.Series(range(3), pi)

>>> ser.reindex(pi2, method="ffill")
2015-12-28/2016-01-03   NaN
2015-12-28/2016-01-03   NaN
2015-12-28/2016-01-03   NaN
Freq: W-SUN, dtype: float64

But pi vs pi2 are uncomparable just like pi vs dti, so we should raise the same way:

>>> ser.reindex(dti, method="ffill")
[...]
  File "pandas/_libs/algos.pyx", line 450, in pandas._libs.algos.pad
    if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
TypeError: '<' not supported between instances of 'Timestamp' and 'Period'

pandas/core/indexes/base.py

jreback · 2020-11-29T18:21:39Z

pandas/tests/indexes/period/test_indexing.py

+        pd.IntervalIndex.from_breaks(dti4),
+    ]
+)
+def non_comparable(request):


non_comparable_idx

jreback · 2020-11-29T18:22:52Z

pandas/core/indexes/base.py

@@ -4973,6 +4993,22 @@ def _maybe_promote(self, other: "Index"):

        return self, other

+    def _get_other_deep(self, other: "Index") -> "Index":


this should just be a function, no? its also a funny name. I think we have other similar things (e.g. this look like what .to_dense() does but for dtypes and not values).

updated to _unpack_nested_dtypes function

…f-maybe_promote-2

jbrockmendel · 2020-12-02T01:56:28Z

alright! get ready to see a bunch of de-duplication and fastpaths coming in.

jreback · 2020-12-02T02:35:02Z

hit me!

REF: implement _should_compare

59c244d

jbrockmendel commented Nov 27, 2020

View reviewed changes

jreback requested changes Nov 27, 2020

View reviewed changes

jbrockmendel commented Nov 27, 2020

View reviewed changes

REF: implement _should_compare

701e7fe

jbrockmendel mentioned this pull request Nov 27, 2020

REF: implement _should_compare #38114

Closed

jbrockmendel added 3 commits November 27, 2020 11:50

Fixup no_matches->missing

4559d44

Merge branch 'ref-maybe_promote-2' of github.com:jbrockmendel/pandas …

fbb4c6c

…into ref-maybe_promote-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

653ff96

…f-maybe_promote-2

jreback requested changes Nov 29, 2020

View reviewed changes

jreback added the Index Related to the Index class or subclasses label Nov 29, 2020

jbrockmendel added 3 commits November 29, 2020 11:04

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

7bb5929

…f-maybe_promote-2

docstring, get_other_deep->unpack_nested_dtype

dbfc820

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

df84891

…f-maybe_promote-2

jbrockmendel mentioned this pull request Dec 1, 2020

REF: IntervalIndex.intersection match pattern in other intersection methods #38190

Merged

jreback added this to the 1.2 milestone Dec 2, 2020

jreback approved these changes Dec 2, 2020

View reviewed changes

jreback merged commit 044df8c into pandas-dev:master Dec 2, 2020

jbrockmendel deleted the ref-maybe_promote-2 branch December 2, 2020 01:57

jbrockmendel mentioned this pull request May 28, 2021

API/BUG: get_indexer_non_unique(object_dtype)? #36320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: implement _should_compare #38105

REF: implement _should_compare #38105

jbrockmendel commented Nov 27, 2020 •

edited

Loading

jbrockmendel Nov 27, 2020

jbrockmendel Nov 27, 2020

jreback left a comment

jbrockmendel commented Nov 27, 2020

jbrockmendel Nov 27, 2020

jreback Nov 27, 2020

jbrockmendel Nov 27, 2020

jbrockmendel commented Nov 27, 2020

jbrockmendel commented Nov 28, 2020

jreback Nov 29, 2020

jreback Nov 29, 2020

jbrockmendel Nov 29, 2020

jbrockmendel commented Dec 2, 2020

jreback commented Dec 2, 2020

		@@ -4973,6 +4993,22 @@ def _maybe_promote(self, other: "Index"):

		return self, other

		def _get_other_deep(self, other: "Index") -> "Index":

REF: implement _should_compare #38105

REF: implement _should_compare #38105

Conversation

jbrockmendel commented Nov 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 27, 2020

jbrockmendel commented Nov 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 2, 2020

jreback commented Dec 2, 2020

jbrockmendel commented Nov 27, 2020 •

edited

Loading