FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

hsperr · 2015-04-22T00:32:36Z

had some time this morning and started working

for now I just uncommented the explicit setting to None of the index names.
let me know if I should remove it completely.

sinhrks · 2015-04-22T14:16:05Z

Related to #9926. From the issue, expected behavior is:

In case of Index (ops/method) Index, name is preserved if left and right index have the same name. Otherwise, name is reset to None.
In case of Index (ops/method) scalar or scalar (ops/method) Index, name of the index is preserved.

So can you confirm our understandings are the same, and describe what is being changed?

hsperr · 2015-04-22T23:24:27Z

Hey sinhrks,

thanks for you kind reply. I read the task you linked and had to find that given the other two issues my understanding of the correct behavior was wrong.

I will change the behavior to be:

idx = pd.Index([1,2,3], name='a')
idx2 = pd.Index([4,5,6], name='b')
idx3 = pd.Index([1,2,3], name='a')
L = [1]

idx.union(idx2).name
#None
idx.union(idx3).name
#a
idx.union(L)
#a

idx.intersection(idx2).name
#None
idx.intersection(idx3).name
#a
idx.intersection(L).name
#a

Right now index.union(array) is crashing, I fixed that while changing the naming issue. Please let me know if that is intended or not.

I am rerunning the tests, will add above mentioned test cases and reissue a pull request later.

jreback · 2015-04-22T23:56:06Z

@hsperr this should mirror .append (which IIRC you fixed?) (and put the tests next to it)

hsperr · 2015-04-23T02:03:46Z

I am sorry that my initial PR was so vague and unclear, I will improve on that.

I just pushed the new changes.
I started out writing an extensive naming test for .append to check if intersection and append have the same naming behavior

please review:
test_append_naming_behavior()

it tests several combination of index append index index append array index append datetimeindex with same names, different names or no name.
if you have an idea for adding a multiindex assert.

I added the same tests for intersection and union to make sure the have the same behavior.
I am sorry for the many pushes but I made a few mistakes on my way to this solution.

jreback · 2015-04-28T00:44:26Z

pandas/core/index.py

@@ -1219,14 +1219,17 @@ def union(self, other):
        if len(other) == 0 or self.equals(other):
            return self

+        other = _ensure_index(other)
+


don't change something unrelated (and this is in a fast path for other things, so it causes a perf drag)

The reason why i changed this is, that a union between an index and a numpy/python array right now crashes on AttributeErrors

index = pd.Index([1,2,3])
index.union([4,5,6])

-> 1238 if self.dtype != other.dtype:
1239 this = self.astype('O')
1240 other = other.astype('O')

AttributeError: 'list' object has no attribute 'dtype'

so I think the _ensure_index(other) has to be done regardless of the if len(self) == 0:

I also think it is implemented like that in intersection

sinhrks · 2015-04-28T13:42:17Z

Can you add tests for short pathes for each dtypes, maybe fail with current impl? I agree moving tests to Base class is better.

self.assertEqual(pd.Index([], name='a').union(pd.Index([], name='a')).name, 'a')
self.assertEqual(pd.Index(['x'], name='a').union(pd.Index([], name='a')).name, 'a')
self.assertEqual(pd.Index([], name='a').union(pd.Index(['x'], name='a')).name, 'a')
self.assertEqual(pd.Index([], name='a').union(pd.Index([], name='b')).name, None)
self.assertEqual(pd.Index(['x'], name='a').union(pd.Index([], name='b')).name, None)
self.assertEqual(pd.Index([], name='a').union(pd.Index(['x'], name='b')).name, None)

hsperr · 2015-04-30T22:00:35Z

ok I have been looking into the base test and I think I get the general architecture now.
I suppose the right way to do it then is to use the self.create_index() method since that gets overwritten by every subclass and returns the right index. I wonder if its okay to give that create index two optional parameters namely 'content' and 'name' or something or whether I should first create and index and then give it a name.

Just to make sure I understand could you explain more on what a short path for each dtype means? is it related to the code you posted?

sinhrks · 2015-05-01T10:33:17Z

@hsperr I meant these lines which return values without calling _wrap_union_result.

Because ``DatetimeIndex` has separate logic and similar short path, I think it is better to test each dtypes.

https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py#L970

hsperr · 2015-05-13T22:31:37Z

Ok sorry for the long break, I was out of town for a while but I am still active on this issue.

@sinhrks I understand the short paths now and build a testcase that helped me find quite a few of them. I am having a bit of trouble moving the testcase to Base class though since some indexes (Categorical) require special setup (same categories etc.) But I will figure it out!

Another question I have is in this case
https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L1375
it seems to me if we union an empty index with a non empty one we just return the non empty one, in this case setting the correct name would be changeing the name of the other index I guess.

I was also wondering if it is too dirty to make something like a decorator function that determines what the name should be before we enter and makes sure the name is correct when it returns, no matter which of the returns we take, might make up for less/cleaner code

sinhrks · 2015-05-14T15:04:07Z

it seems to me if we union an empty index with a non empty one we just return the non empty one, in this case setting the correct name would be changeing the name of the other index I guess.

I think this should be shallow-copied like #10033, @jreback and @shoyer ?

I was also wondering if it is too dirty to make something like a decorator function

You may be able to use common._maybe_match_name. But the logic looks not to cover objects which doesn't have name attribute. Changing the func to cover them makes impl easier?

https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L3324

sinhrks · 2015-05-15T19:42:58Z

pandas/tests/test_index.py

+        numpy_array = np.array([1,2,3])
+
+        #index union index naming behavior
+        self.assertEqual(idx_name_a.union(idx_name_b).name, None)


Can you change None comparisons to self.assertTrue(... is None)? Sorry my test examples are incorrect.
Or you can use check_index_equal to check names (see #9972).

…as-dev#9943 partly pandas-dev#9862

hsperr · 2015-05-17T04:13:02Z

I pushed the code that I have so far, to discuss a few items and whether the direction is okay or not.

the current test cases are here:
https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L91

I wanted to make them as generic as possible os every index type just has to instantiate a few indexes and can use these methods.

Problems I found:

test_union
https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L795
right now fails because I changed
https://github.com/hsperr/pandas/blob/index_names/pandas/core/index.py#L1372

return _ensure_index(other)

to

result = _ensure_index(other, copy=True)
result.name = result_name
return result

If Index([]).union(someotherindex) is supposed to return the other index, I will add that behavior to the documentation and change the logic back

I moved ensure_index out of the if block because otherwise this line crashes with when we try to union with python arrays
https://github.com/hsperr/pandas/blob/index_names/pandas/core/index.py#L1379

https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L4718
right now crashes because df.loc seems to call append on indexes somewhere inbetween and that resets the name (because it seems to build new indexes and the names don't match). I am still debugging this.

EDIT: I also just saw that rebasing with master mangled up the whats new file, I will change that

jreback · 2015-06-26T23:38:11Z

pls rebase on master

max-sixty · 2015-07-27T01:43:33Z

@hsperr - I was planning to have a look at this, as it intersects (no pun intended...) a fix I just worked on. Can you let us know if you're planning on pushing this over the line? Otherwise I'm happy to help out and take it there.

hsperr · 2015-07-27T01:55:33Z

@MaximilianR recently quite a few things came up in my private life and I didn't have too much time to get back to this task, recently I tried getting back into it but I also realize that it involves changes in many places and I do not know the pandas setup well enough to make good test cases and be sure that all the branches are covered properly. So if you want you can take it from here, and I am sorry for causing any unnecessary work. If you need some info on the task please let me know!

jreback · 2015-08-05T21:39:50Z

cc @MaximilianR

IIRC you have a PR to replace this, yes?

max-sixty · 2015-08-06T01:20:41Z

Nothing yet @jreback, but it's next on my list. TBH it's unlikely to be before next weekend...

jreback · 2015-08-06T01:21:48Z

thanks

hsperr force-pushed the index_names branch 2 times, most recently from 4e4d6f5 to c1db455 Compare April 22, 2015 00:38

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 22, 2015

jreback added this to the 0.16.1 milestone Apr 22, 2015

hsperr force-pushed the index_names branch 4 times, most recently from 991a6b6 to 725ffe5 Compare April 23, 2015 02:02

hsperr force-pushed the index_names branch from 725ffe5 to a641434 Compare April 23, 2015 04:03

jreback reviewed Apr 28, 2015
View reviewed changes

jreback modified the milestones: 0.17.0, 0.16.1 Apr 28, 2015

sinhrks reviewed May 15, 2015
View reviewed changes

sinhrks mentioned this pull request May 16, 2015

BUG: Index.name is lost during timedelta ops #10158

Merged

hsperr added 2 commits May 17, 2015 10:01

FIX: interesction and union correct name chaning behavior. fixes pand…

d494287

…as-dev#9943 partly pandas-dev#9862

FIX: Reworked the tests, removed the equal None

3e515ce

hsperr force-pushed the index_names branch 2 times, most recently from b9b4ad9 to 833f1be Compare May 17, 2015 04:12

sinhrks mentioned this pull request May 31, 2015

BUG: Index.union cannot handle array-likes #10157

Merged

FIX: more shortpaths, refactor tests

4dc1a4d

hsperr force-pushed the index_names branch from 833f1be to 4dc1a4d Compare July 18, 2015 01:04

jreback mentioned this pull request Jul 24, 2015

Drop & insert on subtypes of index return their subtypes #10620

Closed

jreback closed this Aug 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

hsperr commented Apr 22, 2015

sinhrks commented Apr 22, 2015

hsperr commented Apr 22, 2015

jreback commented Apr 22, 2015

hsperr commented Apr 23, 2015

jreback Apr 28, 2015

hsperr Apr 28, 2015

sinhrks commented Apr 28, 2015

hsperr commented Apr 30, 2015

sinhrks commented May 1, 2015

hsperr commented May 13, 2015

sinhrks commented May 14, 2015

sinhrks May 15, 2015

hsperr commented May 17, 2015

jreback commented Jun 26, 2015

max-sixty commented Jul 27, 2015

hsperr commented Jul 27, 2015

jreback commented Aug 5, 2015

max-sixty commented Aug 6, 2015

jreback commented Aug 6, 2015

FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

Conversation

hsperr commented Apr 22, 2015

sinhrks commented Apr 22, 2015

hsperr commented Apr 22, 2015

jreback commented Apr 22, 2015

hsperr commented Apr 23, 2015

jreback Apr 28, 2015

Choose a reason for hiding this comment

hsperr Apr 28, 2015

Choose a reason for hiding this comment

sinhrks commented Apr 28, 2015

hsperr commented Apr 30, 2015

sinhrks commented May 1, 2015

hsperr commented May 13, 2015

sinhrks commented May 14, 2015

sinhrks May 15, 2015

Choose a reason for hiding this comment

hsperr commented May 17, 2015

jreback commented Jun 26, 2015

max-sixty commented Jul 27, 2015

hsperr commented Jul 27, 2015

jreback commented Aug 5, 2015

max-sixty commented Aug 6, 2015

jreback commented Aug 6, 2015