fix inconsistent index naming with union/intersect #35847 #36413

iamlemec · 2020-09-17T03:12:58Z

closes BUG: inconsistent naming when combining indices of various types #35847
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This takes care of some inconsistency in how names are handled by Index functions union and intersection, as discussed in #35847. I believe this covers all index types, either through the base class or in the subclass when necessary.

What I've implemented here actually uses the unanimous convention, wherein all input names must match to get assigned to the output. Originally, I was thinking consensus would be better (assign if there is only one non-None input name), but looking through the existing tests, it seems that unanimous was usually expected. I also had some worries about whether index names would become too "contagious" with consensus. Anyway, it's easy to change between the two if people have strong opinions on this.

jbrockmendel · 2020-09-17T03:34:43Z

pandas/core/indexes/base.py


    def _wrap_setop_result(self, other, result):
        name = get_op_result_name(self, other)
-        return self._shallow_copy(result, name=name)
+        if isinstance(result, ABCIndexClass):


this check will include Index objects but exclude subclasses. is that intentional? if so, can you add a comment about why

I'll admit I don't actually understand this whole ABC business 100%, but I'm getting:

isinstance(pd.MultiIndex, pd.Index) -> False isinstance(pd.MultiIndex, ABCIndexClass) -> True issubclass(pd.MultiIndex, pd.Index) -> True

Should I be using issubclass instead? Wasn't trying to do anything too fancy here.

yah its not intuitive. basically isinstance(obj, ABCIndexClass) is equivalent to type(obj) is Index, so what you probably want is isinstance(obj, Index), which behaves like normal

Hmm, but wouldn't we get isinstance(multiindex, ABCIndexClass) == True and type(multiindex) != pd.Index here? Either way, agreed that isinstance will get us what we want.

my bad, i confused ABCIndexClass with ABCIndex, never mind.

pandas/core/indexes/base.py

jbrockmendel · 2020-09-17T03:36:32Z

pandas/core/indexes/base.py

@@ -5926,3 +5931,22 @@ def _maybe_asobject(dtype, klass, data, copy: bool, name: Label, **kwargs):
        return index.astype(object)

    return klass(data, dtype=dtype, copy=copy, name=name, **kwargs)
+
+
+def get_unanimous_names(*indexes):


do we ever get more than two here?

Would indexes instead of *indexes work>?

yeah, we use this in union_indexes which operates on a list of indices. but most uses are explicitly two, so I figured this would make the typical syntax easier.

can you type the input and output

pandas/core/indexes/datetimelike.py

jbrockmendel · 2020-09-17T03:38:18Z

pandas/core/indexes/multi.py

@@ -3415,7 +3420,7 @@ def union(self, other, sort=None):
        other, result_names = self._convert_can_do_setop(other)

        if len(other) == 0 or self.equals(other):
-            return self
+            return self._shallow_copy(names=result_names)


self.rename?

Yup, makes sense. Certainly more legible, and seems like equivalent.

jbrockmendel · 2020-09-17T03:38:49Z

pandas/core/indexes/numeric.py

@@ -180,7 +180,8 @@ def _union(self, other, sort):
        if needs_cast:
            first = self.astype("float")
            second = other.astype("float")
-            return first._union(second, sort)
+            result = first._union(second, sort)
+            return Float64Index(result)


this should already be a Float64Index. are there cases where it isnt?

Ah, you're right. At some point I was allowing _union to return just an array, so I ended up needing that. But now there's a _shallow_copy at return, so it's redundant. Will revert.

jbrockmendel · 2020-09-17T03:39:16Z

pandas/tests/indexes/multi/test_setops.py


    the_union = idx.union(idx[:0], sort=sort)
-    assert the_union is idx
+    assert tm.equalContents(the_union, idx)


can you use tm.assert_index_equal

pandas/core/indexes/api.py

jreback · 2020-09-17T16:13:41Z

pandas/core/indexes/base.py

@@ -5926,3 +5931,22 @@ def _maybe_asobject(dtype, klass, data, copy: bool, name: Label, **kwargs):
        return index.astype(object)

    return klass(data, dtype=dtype, copy=copy, name=name, **kwargs)
+
+
+def get_unanimous_names(*indexes):


can you type the input and output

jreback

looks really good @iamlemec

only small concern is that we are likely not fully validating whether we are returning a copy for the 0 len cases or where indexes are equal.

but certainly can do as a followup.

jreback · 2020-09-19T01:00:33Z

doc/source/whatsnew/v1.2.0.rst

@@ -293,6 +293,7 @@ Indexing

 - Bug in :meth:`PeriodIndex.get_loc` incorrectly raising ``ValueError`` on non-datelike strings instead of ``KeyError``, causing similar errors in :meth:`Series.__geitem__`, :meth:`Series.__contains__`, and :meth:`Series.loc.__getitem__` (:issue:`34240`)
 - Bug in :meth:`Index.sort_values` where, when empty values were passed, the method would break by trying to compare missing values instead of pushing them to the end of the sort order. (:issue:`35584`)
+- Harmonize resulting index names from :meth:`Index.union` and :meth:`Index.intersection` across various index types (:issue:`35847`)


if there are significant changes that a user would want to know can you add that detail

also migth warrant a sub-section if this is something easier to see via code-examples

yup, added in a bit about name preservation

right add the issue number above and remove this note

jreback · 2020-09-19T01:04:40Z

pandas/tests/indexes/test_common.py

@@ -124,6 +124,65 @@ def test_corner_union(self, index, fname, sname, expected_name):
        expected = index.drop(index).set_names(expected_name)
        tm.assert_index_equal(union, expected)

+        # test copy.union(subset) - need sort for unicode and string


can you make this another test

iamlemec · 2020-09-19T08:02:51Z

Ok, added in a code example to whatsnew showing the new/consistent naming behavior when aggregating. Also split off those new tests.

Regarding copying in the various empty/equality cases, copy here means a deep copy including values, right? Seems like some of the tests are using "is" which breaks when you add in copying. I can look into that and the nan issues in a subsequent PR if that works.

jreback

doc comment, ping on green.

jreback · 2020-09-19T20:12:35Z

doc/source/whatsnew/v1.2.0.rst

+Index/column name preservation when aggregating
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When aggregating using :meth:`concat` or the :class:`DataFrame` constructor, Pandas


can you add the issue reference here.

Also I would add the same text here in the docs, maybe in the reshaping section, in a note

yep, i ended up adding it as a note in the "merge, join, concatenate, and compare" section.

jreback · 2020-09-19T20:12:53Z

doc/source/whatsnew/v1.2.0.rst

@@ -293,6 +293,7 @@ Indexing

 - Bug in :meth:`PeriodIndex.get_loc` incorrectly raising ``ValueError`` on non-datelike strings instead of ``KeyError``, causing similar errors in :meth:`Series.__geitem__`, :meth:`Series.__contains__`, and :meth:`Series.loc.__getitem__` (:issue:`34240`)
 - Bug in :meth:`Index.sort_values` where, when empty values were passed, the method would break by trying to compare missing values instead of pushing them to the end of the sort order. (:issue:`35584`)
+- Harmonize resulting index names from :meth:`Index.union` and :meth:`Index.intersection` across various index types (:issue:`35847`)


right add the issue number above and remove this note

jbrockmendel · 2020-09-19T20:19:18Z

pandas/core/indexes/datetimes.py

+
+        res_name = get_unanimous_names(self, *others)[0]
+        if this.name != res_name:
+            return this._shallow_copy(name=res_name)


rename instead of shallow_copy?

yup, indeed. i actually went through and made the same change in base.py and api.py. seems like it works fine all around.

iamlemec · 2020-09-20T17:13:46Z

@jreback fixed the docs indentation issue, that's green now. i'm having trouble figuring out went wrong with some of the other tests. is it some kind of internal CI error?

jbrockmendel · 2020-09-21T17:11:37Z

CI failures were unrelated, should be fixed if you merge master

jbrockmendel · 2020-09-24T16:07:58Z

pandas/core/indexes/base.py

@@ -5935,3 +5941,22 @@ def _maybe_asobject(dtype, klass, data, copy: bool, name: Label, **kwargs):
        return index.astype(object)

    return klass(data, dtype=dtype, copy=copy, name=name, **kwargs)
+
+
+def get_unanimous_names(*indexes: Type[Index]) -> Tuple[Any, ...]:


Type[Index] seems really weird here. Shouldnt it be Tuple[Index]? And i think the return type should use Label instead of Any

Ah, I was wondering if there was a Label notion! And now I see there's a _typing.py, will keep that handy in the future.

Yeah, I don't know why that extraneous ended up Type there. But actually with an *args situation, you just give the type of the individual elements, so I think it should just be Index.

jbrockmendel · 2020-09-24T16:08:35Z

pandas/core/indexes/datetimelike.py


        if self.equals(other):
            return self._get_reconciled_name_object(other)

        if len(self) == 0:
-            return self.copy()
+            return self.copy()._get_reconciled_name_object(other)


in the base class you got rid of the _get_reconciled_name_object usage. why going the other direction here?

Still keeping this around for the rare case where you want to return the same values but with a possible name change. In base.py it only shows up in the one intersection return clause (and not in union because of the union/_union split). It is interesting though that the version of intersection in datetimelike.py has two more quick return clauses than the base class.

iamlemec · 2020-09-27T02:48:05Z

I think we're mostly good here, but the travis-ci build for ARM just kinda bailed without comment.

jbrockmendel · 2020-09-28T15:47:58Z

pandas/core/reshape/concat.py

    get_objs_combined_axis,
 )
 import pandas.core.indexes.base as ibase
+from pandas.core.indexes.base import get_unanimous_names


get_unanimous_names is in indexes.api, should get it from there

jbrockmendel · 2020-09-28T15:49:06Z

pandas/tests/indexes/datetimes/test_setops.py

@@ -471,7 +471,7 @@ def test_intersection_bug(self):
    def test_intersection_list(self):
        # GH#35876
        values = [pd.Timestamp("2020-01-01"), pd.Timestamp("2020-02-01")]
-        idx = pd.DatetimeIndex(values, name="a")
+        idx = pd.DatetimeIndex(values)
        res = idx.intersection(values)
        tm.assert_index_equal(res, idx)


could keep the name in idx and test for tm.assert_index_equal(res, idx.rename(None))?

jbrockmendel · 2020-09-28T15:49:21Z

pandas/tests/indexes/multi/test_join.py

@@ -46,7 +46,7 @@ def test_join_level_corner_case(idx):

 def test_join_self(idx, join_type):
    joined = idx.join(idx, how=join_type)
-    assert idx is joined
+    assert tm.equalContents(joined, idx)


does assert_index_equal not work here?

jbrockmendel · 2020-09-28T15:49:33Z

pandas/tests/indexes/multi/test_setops.py

@@ -278,7 +278,7 @@ def test_intersection(idx, sort):

    # corner case, pass self
    the_int = idx.intersection(idx, sort=sort)
-    assert the_int is idx
+    assert tm.equalContents(the_int, idx)


assert_index_equal?

iamlemec · 2020-09-28T18:09:14Z

Thanks @jbrockmendel! Pushed all your suggested changes.

jbrockmendel · 2020-10-07T01:00:18Z

@iamlemec can you merge master; IIRC this was pretty close to ready

iamlemec · 2020-10-07T07:37:08Z

@jbrockmendel sure thing, just pushed a rebase onto master

jreback · 2020-10-07T11:33:44Z

thanks @iamlemec very nice!

iamlemec · 2020-10-07T18:46:01Z

thanks for all the help @jreback and @jbrockmendel!

…v#36413)

jbrockmendel reviewed Sep 17, 2020

View reviewed changes

pandas/core/indexes/base.py Show resolved Hide resolved

jbrockmendel reviewed Sep 17, 2020

View reviewed changes

pandas/core/indexes/datetimelike.py Show resolved Hide resolved

jbrockmendel reviewed Sep 17, 2020

View reviewed changes

jreback added the Index Related to the Index class or subclasses label Sep 17, 2020

jreback requested changes Sep 17, 2020

View reviewed changes

jreback requested changes Sep 19, 2020

View reviewed changes

jreback added this to the 1.2 milestone Sep 19, 2020

jreback requested changes Sep 19, 2020

View reviewed changes

jbrockmendel reviewed Sep 19, 2020

View reviewed changes

iamlemec force-pushed the union_unequal branch from 4ebc5a6 to 47d3ba1 Compare September 19, 2020 22:40

iamlemec force-pushed the union_unequal branch from 0ddb3f2 to 4206057 Compare September 21, 2020 17:30

jbrockmendel mentioned this pull request Sep 24, 2020

ENH: return RangeIndex from difference, symmetric_difference #36564

Merged

5 tasks

jbrockmendel reviewed Sep 24, 2020

View reviewed changes

jbrockmendel reviewed Sep 28, 2020

View reviewed changes

fix inconsistent index naming with union/intersect GH35847

17254c4

iamlemec force-pushed the union_unequal branch from cedd79c to 17254c4 Compare October 7, 2020 05:08

jreback approved these changes Oct 7, 2020

View reviewed changes

jreback merged commit 1f67100 into pandas-dev:master Oct 7, 2020

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

fix inconsistent index naming with union/intersect GH35847 (pandas-de…

332d4c7

…v#36413)

fix inconsistent index naming with union/intersect #35847 #36413

fix inconsistent index naming with union/intersect #35847 #36413

Conversation

iamlemec commented Sep 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamlemec commented Sep 19, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamlemec commented Sep 20, 2020

jbrockmendel commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamlemec commented Sep 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamlemec commented Sep 28, 2020

jbrockmendel commented Oct 7, 2020

iamlemec commented Oct 7, 2020

jreback commented Oct 7, 2020

iamlemec commented Oct 7, 2020

iamlemec commented Sep 17, 2020 •

edited

Loading