-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Clean multiindex keys #15615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean multiindex keys #15615
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15615 +/- ##
==========================================
+ Coverage 90.99% 91.01% +0.01%
==========================================
Files 161 143 -18
Lines 49303 49387 +84
==========================================
+ Hits 44863 44949 +86
+ Misses 4440 4438 -2
Continue to review full report at Codecov.
|
pandas/tests/series/test_indexing.py
Outdated
|
||
# boolean generator (fails) | ||
# s2 = self.series.copy() | ||
# s2[iter(bool_idx)] = values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these future tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes (first of the points I mentioned for future issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than commenting simply xfail them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I had missed the move to pytest
2fb090f
to
b250e91
Compare
(green) |
pandas/tests/series/test_indexing.py
Outdated
s2.loc[iter(idces)] = values | ||
assert_series_equal(s2, expected) | ||
|
||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports go at the top. make 2 separate (non-class based) functions, the 2nd one which will xfail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not sure I understood what you meant by "non-class based")
b250e91
to
b4a1c5b
Compare
@@ -198,6 +202,27 @@ def test_loc_getitem_array(self): | |||
result = x.loc[scalar] | |||
tm.assert_series_equal(result, expected) | |||
|
|||
def test_loc_generator(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there does not appear to be an equivalent test on Index with a generator, or even an iterator. Sure they should work. But is this fixed in this PR? or just adding testing to something already works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes some broken code which prevented __getitem__(generator)
to work, so yes, strictly speaking it fixes it, but this was a totally unintended consequence. I actually made sure to keep the tests for generators in two independent commits, which you can drop if you don't think they are useful. But there was already at least one test on indexing a Series
with generators, I just expanded a bit. No idea about indexing Index
es.
not sure why this didn't fail. This is a change in behavior (which we should prob do, but needs discussion).
|
Which commit are you on?! In [3]: idx = pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two'])
In [4]: s = pd.Series(np.arange(9, dtype='int64'), index=idx).sort_index()
In [5]: s.loc[['A', 'D']]
Out[5]:
one two
A bar 1
baz 2
foo 0
dtype: int64
In [6]: pd.util.print_versions.get_sys_info()[0]
Out[6]: ('commit', 'b4a1c5b168961324d2260657424df621bbd0a645') |
I rebased the PR. so rebase and push again. |
This is related to 05d70f4. Before: In [2]: s = pd.Series(range(9), index=pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two']))
In [3]: s.loc._getitem_iterable(['A', 'E'])
Out[3]:
one two
A foo 0
bar 1
baz 2
dtype: int64
In [4]: pd.util.print_versions.get_sys_info()[0]
Out[4]: ('commit', '998c801f76256990b98d3f0d2ad885ae27c955a1') After: In [2]: s = pd.Series(range(9), index=pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two']))
In [3]: s.loc._getitem_iterable(['A', 'E'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-3-62e87e14f902> in <module>()
----> 1 s.loc._getitem_iterable(['A', 'E'])
/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1090 # if it cannot handle
1091 indexer, keyarr = labels._convert_listlike_indexer(
-> 1092 key, kind=self.name)
1093 if indexer is not None:
1094 return self.obj.take(indexer, axis=axis)
/home/nobackup/repo/pandas/pandas/indexes/multi.py in _convert_listlike_indexer(self, keyarr, kind)
1598 mask = check == -1
1599 if mask.any():
-> 1600 raise KeyError('%s not in index' % keyarr[mask])
1601
1602 return indexer, keyarr
KeyError: "['E'] not in index"
In [4]: pd.util.print_versions.get_sys_info()[0]
Out[4]: ('commit', '05d70f4e617a274813bdb02db69143b5554aa106') My patch is good as before, but your new code is not coherent with our "desired" behavior for indexers. Now, I assumed that proceeding step by step was the best way to move forward, but my assumption was clearly wrong if while I patiently wait for my trivial patch to get merged, you do find the time to completely rewrite related indexing methods (deleting a line I had explicitly mentioned as important when submitting my PR - all this in a commit which appears as "DOC: ..." in the logs, just to make it easier to spot it). So, considering that the changes you made ironically go in a direction I like, we'll probably rather have the "famous" discussion first and then come back on this. I'm opening an issue for this in a minute. |
your patch fails on master. |
and how is this not step-by-step? any indexing patch is hardly trivial. |
... if you don't understand what you are doing |
@toobaz these comments are hardly necessary. You can put up patches all you like. But they need review and I will get to them when I have time. |
Patches are a total waste of time if you change the underlying code, and even more if you introduce new bugs there. You have time for your huge patch but not for my three-liner?! (because the other 10 lines were trivial refactoring) |
how so, we have 75 open PR's of which I have 10, some of which have sat here for weeks waiting for review / comment by others. This is how things work. I looked at your patch and it didn't work. So I asked you to rebase? I am not sure of the problem. |
How is this even true? It completely passed the test suite. Yours does not. |
I am, but my time here is currently wasted. See my code above, it's not complicated, you just changed the behavior of a (non-public) method. No idea how to reproduce with user code, currently.
It did before you introduced yours, so this is obviously a non-argument. |
@toobaz I have asked you to rebase. If you can great. PR's are merged when they fully pass the test suite. patches are certainly require rebase a lot sometimes. To be honest I am working on someother PR's which DID require some cleanup in the indexing code. You should be more appreciative of others's efforts. |
Ah ah, you telling me this. When you'll want to be appreciative of my efforts, feel free to reopen this (which I will be happy to rebase).
Doesn't make sense as of now (you tried yourself, as you said) |
again not sure of what you mean.
also not sure what you mean. Sure I appreciate your efforts @toobaz but at the same time I ask you to appreciate that others are contributing. It doesn't matter that a patch was put in before yours, this PR still needs rebasing. |
and to clarify I just ran this PR and saw the failing tests, I didn't attempt to fix it. |
needs a rebase and moved whatsnew. if its easier to close and re-submit that is fine too. |
@toobaz : I know it's been some time, but according to GitHub, the only thing you absolutely need to update is the |
Yes in order to solve conflicts, but then unfortunately tests will fail (see my previous comment). So I see it as pointless, but if instead there's some utility, I can do it easily.
Well, in principle yes, but no problem for me if you want to close, I can open another PR in the future. |
@toobaz : How about this: if you could patch the merge conflict (it's pretty simple to do as you said), and we can let Travis run the tests again just to see where things stand now. If they fail, I can close this for you. However, if that happens, if you could add a comment regarding How does that sound? |
b4a1c5b
to
03cec75
Compare
Sure, no problem! Done.
Done: #16920 |
Alright: Two test failures due to the fact that you're using outdated testing functions ( I will close this PR given your previous comments. As you said previously, #16920 will need to be patched before fixing #15452. Feel free to take a stab at the former if you have the time. |
git diff master | flake8 --diff
Most of the change to code is mere refactoring (the small change in
pandas/core/frame.py
can be undone once we decide about #15438), but I did suppress three obsolete (the same job is already done in the right way in_getitem_iterable
) and buggy lines which actually not only caused #15452, but were the real explanation for #15424.As a side consequence, generators can now be used to index a
MultiIndex
. This is not documented, as far as I know, but it was already possible with a flatIndex
, and it was actually also tested, so I added tests forMultiIndex
too and to setitem (except that not all of them currently work). Once this PR goes through, I plan to open three separate issues fordf.loc[['missing', 'keys'], :]
, which still erroneously returns an emptyDataFrame
rather than raising (asdf.loc[['missing', 'keys']]
does)Feel free to tell me if any of this is not worth discussing.