Skip to content

FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

hsperr
Copy link
Contributor

@hsperr hsperr commented Apr 22, 2015

had some time this morning and started working

closes #9943

for now I just uncommented the explicit setting to None of the index names.
let me know if I should remove it completely.

@hsperr hsperr force-pushed the index_names branch 2 times, most recently from 4e4d6f5 to c1db455 Compare April 22, 2015 00:38
@sinhrks
Copy link
Member

sinhrks commented Apr 22, 2015

Related to #9926. From the issue, expected behavior is:

  • In case of Index (ops/method) Index, name is preserved if left and right index have the same name. Otherwise, name is reset to None.
  • In case of Index (ops/method) scalar or scalar (ops/method) Index, name of the index is preserved.

So can you confirm our understandings are the same, and describe what is being changed?

@hsperr
Copy link
Contributor Author

hsperr commented Apr 22, 2015

Hey sinhrks,

thanks for you kind reply. I read the task you linked and had to find that given the other two issues my understanding of the correct behavior was wrong.

I will change the behavior to be:

idx = pd.Index([1,2,3], name='a')
idx2 = pd.Index([4,5,6], name='b')
idx3 = pd.Index([1,2,3], name='a')
L = [1]

idx.union(idx2).name
#None
idx.union(idx3).name
#a
idx.union(L)
#a

idx.intersection(idx2).name
#None
idx.intersection(idx3).name
#a
idx.intersection(L).name
#a

Right now index.union(array) is crashing, I fixed that while changing the naming issue. Please let me know if that is intended or not.

I am rerunning the tests, will add above mentioned test cases and reissue a pull request later.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2015

@hsperr this should mirror .append (which IIRC you fixed?) (and put the tests next to it)

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 22, 2015
@jreback jreback added this to the 0.16.1 milestone Apr 22, 2015
@hsperr hsperr force-pushed the index_names branch 4 times, most recently from 991a6b6 to 725ffe5 Compare April 23, 2015 02:02
@hsperr
Copy link
Contributor Author

hsperr commented Apr 23, 2015

I am sorry that my initial PR was so vague and unclear, I will improve on that.

I just pushed the new changes.
I started out writing an extensive naming test for .append to check if intersection and append have the same naming behavior

please review:
test_append_naming_behavior()

it tests several combination of index append index index append array index append datetimeindex with same names, different names or no name.
if you have an idea for adding a multiindex assert.

I added the same tests for intersection and union to make sure the have the same behavior.
I am sorry for the many pushes but I made a few mistakes on my way to this solution.

@@ -1219,14 +1219,17 @@ def union(self, other):
if len(other) == 0 or self.equals(other):
return self

other = _ensure_index(other)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change something unrelated (and this is in a fast path for other things, so it causes a perf drag)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why i changed this is, that a union between an index and a numpy/python array right now crashes on AttributeErrors

index = pd.Index([1,2,3])
index.union([4,5,6])

-> 1238 if self.dtype != other.dtype:
1239 this = self.astype('O')
1240 other = other.astype('O')

AttributeError: 'list' object has no attribute 'dtype'

so I think the _ensure_index(other) has to be done regardless of the if len(self) == 0:

I also think it is implemented like that in intersection

@jreback jreback modified the milestones: 0.17.0, 0.16.1 Apr 28, 2015
@sinhrks
Copy link
Member

sinhrks commented Apr 28, 2015

Can you add tests for short pathes for each dtypes, maybe fail with current impl? I agree moving tests to Base class is better.

self.assertEqual(pd.Index([], name='a').union(pd.Index([], name='a')).name, 'a')
self.assertEqual(pd.Index(['x'], name='a').union(pd.Index([], name='a')).name, 'a')
self.assertEqual(pd.Index([], name='a').union(pd.Index(['x'], name='a')).name, 'a')
self.assertEqual(pd.Index([], name='a').union(pd.Index([], name='b')).name, None)
self.assertEqual(pd.Index(['x'], name='a').union(pd.Index([], name='b')).name, None)
self.assertEqual(pd.Index([], name='a').union(pd.Index(['x'], name='b')).name, None)

@hsperr
Copy link
Contributor Author

hsperr commented Apr 30, 2015

ok I have been looking into the base test and I think I get the general architecture now.
I suppose the right way to do it then is to use the self.create_index() method since that gets overwritten by every subclass and returns the right index. I wonder if its okay to give that create index two optional parameters namely 'content' and 'name' or something or whether I should first create and index and then give it a name.

Just to make sure I understand could you explain more on what a short path for each dtype means? is it related to the code you posted?

@sinhrks
Copy link
Member

sinhrks commented May 1, 2015

@hsperr I meant these lines which return values without calling _wrap_union_result.

Because ``DatetimeIndex` has separate logic and similar short path, I think it is better to test each dtypes.

@hsperr
Copy link
Contributor Author

hsperr commented May 13, 2015

Ok sorry for the long break, I was out of town for a while but I am still active on this issue.

@sinhrks I understand the short paths now and build a testcase that helped me find quite a few of them. I am having a bit of trouble moving the testcase to Base class though since some indexes (Categorical) require special setup (same categories etc.) But I will figure it out!

Another question I have is in this case
https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L1375
it seems to me if we union an empty index with a non empty one we just return the non empty one, in this case setting the correct name would be changeing the name of the other index I guess.

I was also wondering if it is too dirty to make something like a decorator function that determines what the name should be before we enter and makes sure the name is correct when it returns, no matter which of the returns we take, might make up for less/cleaner code

@sinhrks
Copy link
Member

sinhrks commented May 14, 2015

it seems to me if we union an empty index with a non empty one we just return the non empty one, in this case setting the correct name would be changeing the name of the other index I guess.

I think this should be shallow-copied like #10033, @jreback and @shoyer ?

I was also wondering if it is too dirty to make something like a decorator function

You may be able to use common._maybe_match_name. But the logic looks not to cover objects which doesn't have name attribute. Changing the func to cover them makes impl easier?

https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L3324

numpy_array = np.array([1,2,3])

#index union index naming behavior
self.assertEqual(idx_name_a.union(idx_name_b).name, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change None comparisons to self.assertTrue(... is None)? Sorry my test examples are incorrect.
Or you can use check_index_equal to check names (see #9972).

@hsperr hsperr force-pushed the index_names branch 2 times, most recently from b9b4ad9 to 833f1be Compare May 17, 2015 04:12
@hsperr
Copy link
Contributor Author

hsperr commented May 17, 2015

I pushed the code that I have so far, to discuss a few items and whether the direction is okay or not.

the current test cases are here:
https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L91

I wanted to make them as generic as possible os every index type just has to instantiate a few indexes and can use these methods.

Problems I found:

test_union
https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L795
right now fails because I changed
https://github.com/hsperr/pandas/blob/index_names/pandas/core/index.py#L1372

return _ensure_index(other) 

to

result = _ensure_index(other, copy=True)
result.name = result_name
return result

If Index([]).union(someotherindex) is supposed to return the other index, I will add that behavior to the documentation and change the logic back

I moved ensure_index out of the if block because otherwise this line crashes with when we try to union with python arrays
https://github.com/hsperr/pandas/blob/index_names/pandas/core/index.py#L1379

https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L4718
right now crashes because df.loc seems to call append on indexes somewhere inbetween and that resets the name (because it seems to build new indexes and the names don't match). I am still debugging this.

EDIT: I also just saw that rebasing with master mangled up the whats new file, I will change that

@jreback
Copy link
Contributor

jreback commented Jun 26, 2015

pls rebase on master

@max-sixty
Copy link
Contributor

@hsperr - I was planning to have a look at this, as it intersects (no pun intended...) a fix I just worked on. Can you let us know if you're planning on pushing this over the line? Otherwise I'm happy to help out and take it there.

@hsperr
Copy link
Contributor Author

hsperr commented Jul 27, 2015

@MaximilianR recently quite a few things came up in my private life and I didn't have too much time to get back to this task, recently I tried getting back into it but I also realize that it involves changes in many places and I do not know the pandas setup well enough to make good test cases and be sure that all the branches are covered properly. So if you want you can take it from here, and I am sorry for causing any unnecessary work. If you need some info on the task please let me know!

@jreback
Copy link
Contributor

jreback commented Aug 5, 2015

cc @MaximilianR

IIRC you have a PR to replace this, yes?

@jreback jreback closed this Aug 5, 2015
@max-sixty
Copy link
Contributor

Nothing yet @jreback, but it's next on my list. TBH it's unlikely to be before next weekend...

@jreback
Copy link
Contributor

jreback commented Aug 6, 2015

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: index name lost when indexing with another index
4 participants