-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX: interesction and union changed index names. fixes #9943 partly #9862 #9965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4e4d6f5
to
c1db455
Compare
Related to #9926. From the issue, expected behavior is:
So can you confirm our understandings are the same, and describe what is being changed? |
Hey sinhrks, thanks for you kind reply. I read the task you linked and had to find that given the other two issues my understanding of the correct behavior was wrong. I will change the behavior to be:
Right now index.union(array) is crashing, I fixed that while changing the naming issue. Please let me know if that is intended or not. I am rerunning the tests, will add above mentioned test cases and reissue a pull request later. |
@hsperr this should mirror |
991a6b6
to
725ffe5
Compare
I am sorry that my initial PR was so vague and unclear, I will improve on that. I just pushed the new changes. please review: it tests several combination of index append index index append array index append datetimeindex with same names, different names or no name. I added the same tests for intersection and union to make sure the have the same behavior. |
@@ -1219,14 +1219,17 @@ def union(self, other): | |||
if len(other) == 0 or self.equals(other): | |||
return self | |||
|
|||
other = _ensure_index(other) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change something unrelated (and this is in a fast path for other things, so it causes a perf drag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why i changed this is, that a union between an index and a numpy/python array right now crashes on AttributeErrors
index = pd.Index([1,2,3])
index.union([4,5,6])
-> 1238 if self.dtype != other.dtype:
1239 this = self.astype('O')
1240 other = other.astype('O')
AttributeError: 'list' object has no attribute 'dtype'
so I think the _ensure_index(other) has to be done regardless of the if len(self) == 0:
I also think it is implemented like that in intersection
Can you add tests for short pathes for each dtypes, maybe fail with current impl? I agree moving tests to
|
ok I have been looking into the base test and I think I get the general architecture now. Just to make sure I understand could you explain more on what a short path for each dtype means? is it related to the code you posted? |
@hsperr I meant these lines which return values without calling
Because ``DatetimeIndex` has separate logic and similar short path, I think it is better to test each dtypes. |
Ok sorry for the long break, I was out of town for a while but I am still active on this issue. @sinhrks I understand the short paths now and build a testcase that helped me find quite a few of them. I am having a bit of trouble moving the testcase to Base class though since some indexes (Categorical) require special setup (same categories etc.) But I will figure it out! Another question I have is in this case I was also wondering if it is too dirty to make something like a decorator function that determines what the name should be before we enter and makes sure the name is correct when it returns, no matter which of the returns we take, might make up for less/cleaner code |
I think this should be shallow-copied like #10033, @jreback and @shoyer ?
You may be able to use https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L3324 |
numpy_array = np.array([1,2,3]) | ||
|
||
#index union index naming behavior | ||
self.assertEqual(idx_name_a.union(idx_name_b).name, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change None
comparisons to self.assertTrue(... is None)
? Sorry my test examples are incorrect.
Or you can use check_index_equal
to check names (see #9972).
b9b4ad9
to
833f1be
Compare
I pushed the code that I have so far, to discuss a few items and whether the direction is okay or not. the current test cases are here: I wanted to make them as generic as possible os every index type just has to instantiate a few indexes and can use these methods. Problems I found: test_union
to
If Index([]).union(someotherindex) is supposed to return the other index, I will add that behavior to the documentation and change the logic back I moved ensure_index out of the if block because otherwise this line crashes with when we try to union with python arrays https://github.com/hsperr/pandas/blob/index_names/pandas/tests/test_index.py#L4718 EDIT: I also just saw that rebasing with master mangled up the whats new file, I will change that |
pls rebase on master |
@hsperr - I was planning to have a look at this, as it intersects (no pun intended...) a fix I just worked on. Can you let us know if you're planning on pushing this over the line? Otherwise I'm happy to help out and take it there. |
@MaximilianR recently quite a few things came up in my private life and I didn't have too much time to get back to this task, recently I tried getting back into it but I also realize that it involves changes in many places and I do not know the pandas setup well enough to make good test cases and be sure that all the branches are covered properly. So if you want you can take it from here, and I am sorry for causing any unnecessary work. If you need some info on the task please let me know! |
cc @MaximilianR IIRC you have a PR to replace this, yes? |
Nothing yet @jreback, but it's next on my list. TBH it's unlikely to be before next weekend... |
thanks |
had some time this morning and started working
closes #9943
for now I just uncommented the explicit setting to None of the index names.
let me know if I should remove it completely.