Clean multiindex keys #15615

toobaz · 2017-03-08T14:35:27Z

closes Incoherence between Index and MultiIndex when labels in list are not found #15452
tests added / passed
passes git diff master | flake8 --diff
whatsnew entry

Most of the change to code is mere refactoring (the small change in pandas/core/frame.py can be undone once we decide about #15438), but I did suppress three obsolete (the same job is already done in the right way in _getitem_iterable) and buggy lines which actually not only caused #15452, but were the real explanation for #15424.

As a side consequence, generators can now be used to index a MultiIndex. This is not documented, as far as I know, but it was already possible with a flat Index, and it was actually also tested, so I added tests for MultiIndex too and to setitem (except that not all of them currently work). Once this PR goes through, I plan to open three separate issues for

the failing tests with setitem and generators (of boolean values)
df.loc[['missing', 'keys'], :], which still erroneously returns an empty DataFrame rather than raising (as df.loc[['missing', 'keys']] does)
my proposal about a change in behaviour for missing keys

Feel free to tell me if any of this is not worth discussing.

codecov-io · 2017-03-08T18:41:27Z

Codecov Report

Merging #15615 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15615      +/-   ##
==========================================
+ Coverage   90.99%   91.01%   +0.01%     
==========================================
  Files         161      143      -18     
  Lines       49303    49387      +84     
==========================================
+ Hits        44863    44949      +86     
+ Misses       4440     4438       -2

Flag	Coverage Δ
#multiple	`?`
#single	`?`

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.87% <100%> (+0.06%)`	⬆️
pandas/core/indexing.py	`94.1% <100%> (+0.17%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/util/testing.py	`80.92% <0%> (-19.08%)`	⬇️
pandas/tools/plotting.py	`71.79% <0%> (-10.03%)`	⬇️
pandas/io/json/json.py	`90.27% <0%> (-9.73%)`	⬇️
pandas/util/_tester.py	`35.29% <0%> (-3.6%)`	⬇️
pandas/conftest.py	`94.11% <0%> (-2.32%)`	⬇️
pandas/types/concat.py	`98.06% <0%> (-1.94%)`	⬇️
pandas/io/pickle.py	`79.54% <0%> (-1.71%)`	⬇️
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ca9fcd...03cec75. Read the comment docs.

jreback · 2017-03-10T14:44:05Z

pandas/tests/series/test_indexing.py

+
+        # boolean generator (fails)
+#        s2 = self.series.copy()
+#        s2[iter(bool_idx)] = values


are these future tests?

yes (first of the points I mentioned for future issues)

rather than commenting simply xfail them

Aha! I had missed the move to pytest

toobaz · 2017-03-10T20:49:45Z

(green)

jreback · 2017-03-10T21:31:17Z

pandas/tests/series/test_indexing.py

+        s2.loc[iter(idces)] = values
+        assert_series_equal(s2, expected)
+
+        import pytest


imports go at the top. make 2 separate (non-class based) functions, the 2nd one which will xfail.

(not sure I understood what you meant by "non-class based")

jreback · 2017-03-11T17:42:57Z

pandas/tests/indexing/test_multiindex.py

@@ -198,6 +202,27 @@ def test_loc_getitem_array(self):
        result = x.loc[scalar]
        tm.assert_series_equal(result, expected)

+    def test_loc_generator(self):


there does not appear to be an equivalent test on Index with a generator, or even an iterator. Sure they should work. But is this fixed in this PR? or just adding testing to something already works?

This PR removes some broken code which prevented __getitem__(generator) to work, so yes, strictly speaking it fixes it, but this was a totally unintended consequence. I actually made sure to keep the tests for generators in two independent commits, which you can drop if you don't think they are useful. But there was already at least one test on indexing a Series with generators, I just expanded a bit. No idea about indexing Indexes.

jreback · 2017-03-20T13:37:47Z

not sure why this didn't fail. This is a change in behavior (which we should prob do, but needs discussion).

In [1]:         idx = pd.MultiIndex.from_product([['A', 'B', 'C'],
   ...:                                           ['foo', 'bar', 'baz']],
   ...:                                          names=['one', 'two'])
   ...:         s = pd.Series(np.arange(9, dtype='int64'), index=idx).sort_index()
   ...:     
   ...: 

In [2]: s
Out[2]: 
one  two
A    bar    1
     baz    2
     foo    0
B    bar    4
     baz    5
     foo    3
C    bar    7
     baz    8
     foo    6
dtype: int64
In [3]: s.loc[['A', 'D']]
KeyError: "['D'] not in index"

toobaz · 2017-03-20T14:08:20Z

not sure why this didn't fail.

Which commit are you on?!

In [3]: idx = pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two'])

In [4]: s = pd.Series(np.arange(9, dtype='int64'), index=idx).sort_index()

In [5]: s.loc[['A', 'D']]
Out[5]: 
one  two
A    bar    1
     baz    2
     foo    0
dtype: int64

In [6]: pd.util.print_versions.get_sys_info()[0]
Out[6]: ('commit', 'b4a1c5b168961324d2260657424df621bbd0a645')

jreback · 2017-03-20T14:24:58Z

I rebased the PR. so rebase and push again.

toobaz · 2017-03-20T17:03:48Z

This is related to 05d70f4. Before:

In [2]: s = pd.Series(range(9), index=pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two']))

In [3]: s.loc._getitem_iterable(['A', 'E'])
Out[3]: 
one  two
A    foo    0
     bar    1
     baz    2
dtype: int64

In [4]: pd.util.print_versions.get_sys_info()[0]
Out[4]: ('commit', '998c801f76256990b98d3f0d2ad885ae27c955a1')

After:

In [2]: s = pd.Series(range(9), index=pd.MultiIndex.from_product([['A', 'B', 'C'], ['foo', 'bar', 'baz']],names=['one', 'two']))

In [3]: s.loc._getitem_iterable(['A', 'E'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-62e87e14f902> in <module>()
----> 1 s.loc._getitem_iterable(['A', 'E'])

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1090             # if it cannot handle
   1091             indexer, keyarr = labels._convert_listlike_indexer(
-> 1092                 key, kind=self.name)
   1093             if indexer is not None:
   1094                 return self.obj.take(indexer, axis=axis)

/home/nobackup/repo/pandas/pandas/indexes/multi.py in _convert_listlike_indexer(self, keyarr, kind)
   1598             mask = check == -1
   1599             if mask.any():
-> 1600                 raise KeyError('%s not in index' % keyarr[mask])
   1601 
   1602         return indexer, keyarr

KeyError: "['E'] not in index"

In [4]: pd.util.print_versions.get_sys_info()[0]
Out[4]: ('commit', '05d70f4e617a274813bdb02db69143b5554aa106')

My patch is good as before, but your new code is not coherent with our "desired" behavior for indexers.

Now, I assumed that proceeding step by step was the best way to move forward, but my assumption was clearly wrong if while I patiently wait for my trivial patch to get merged, you do find the time to completely rewrite related indexing methods (deleting a line I had explicitly mentioned as important when submitting my PR - all this in a commit which appears as "DOC: ..." in the logs, just to make it easier to spot it).

So, considering that the changes you made ironically go in a direction I like, we'll probably rather have the "famous" discussion first and then come back on this. I'm opening an issue for this in a minute.

jreback · 2017-03-20T17:12:41Z

My patch is good as before, but your new code is not coherent with our "desired" behavior for

your patch fails on master.

jreback · 2017-03-20T17:13:24Z

Now, I assumed that proceeding step by step was the best way to move forward, but my assumption was clearly wrong if while I patiently wait for my trivial patch to get merged, you do find the time to

and how is this not step-by-step?

any indexing patch is hardly trivial.

toobaz · 2017-03-20T17:22:01Z

any indexing patch is hardly trivial.

... if you don't understand what you are doing

jreback · 2017-03-20T17:26:01Z

... if you don't understand what you are doing

@toobaz these comments are hardly necessary.

You can put up patches all you like. But they need review and I will get to them when I have time.

toobaz · 2017-03-20T17:27:45Z

You can put up patches all you like. But they need review and I will get to them when I have time.

Patches are a total waste of time if you change the underlying code, and even more if you introduce new bugs there. You have time for your huge patch but not for my three-liner?! (because the other 10 lines were trivial refactoring)

jreback · 2017-03-20T17:29:26Z

Patches are a total waste of time if you change the underlying code, and even more if you introduce new bugs there. You have time for your huge patch but not for my three-liner?! (because the other 10 lines were trivial refactoring)

how so, we have 75 open PR's of which I have 10, some of which have sat here for weeks waiting for review / comment by others. This is how things work. I looked at your patch and it didn't work.

So I asked you to rebase? I am not sure of the problem.

jreback · 2017-03-20T17:30:18Z

even more if you introduce new bugs there

How is this even true? It completely passed the test suite. Yours does not.

toobaz · 2017-03-20T17:32:25Z

I am not sure of the problem.

I am, but my time here is currently wasted. See my code above, it's not complicated, you just changed the behavior of a (non-public) method. No idea how to reproduce with user code, currently.

It completely passed the test suite. Yours does not.

It did before you introduced yours, so this is obviously a non-argument.

jreback · 2017-03-20T17:35:37Z

@toobaz I have asked you to rebase. If you can great.

PR's are merged when they fully pass the test suite.

patches are certainly require rebase a lot sometimes. To be honest I am working on someother PR's which DID require some cleanup in the indexing code.

You should be more appreciative of others's efforts.

toobaz · 2017-03-20T17:43:59Z

You should be more appreciative of others's efforts.

Ah ah, you telling me this. When you'll want to be appreciative of my efforts, feel free to reopen this (which I will be happy to rebase).

@toobaz I have asked you to rebase. If you can great.

Doesn't make sense as of now (you tried yourself, as you said)

jreback · 2017-03-20T17:49:48Z

Ah ah, you telling me this. When you'll want to be appreciative of my efforts, feel free to reopen this (which I will be happy to rebase).

again not sure of what you mean.

Doesn't make sense as of now (you tried yourself, as you said)

also not sure what you mean.

Sure I appreciate your efforts @toobaz but at the same time I ask you to appreciate that others are contributing. It doesn't matter that a patch was put in before yours, this PR still needs rebasing.

jreback · 2017-03-20T17:50:32Z

and to clarify I just ran this PR and saw the failing tests, I didn't attempt to fix it.

jreback · 2017-05-13T21:35:58Z

needs a rebase and moved whatsnew. if its easier to close and re-submit that is fine too.

toobaz · 2017-05-13T22:05:54Z

Rebasing is fine, but to apply this, s.loc._getitem_iterable(.) must be brought back to its pre- 05d70f4 behavior (on my TODO list but not terribly high), unless we decide for a change of API in #15747.

gfyoung · 2017-07-13T19:30:18Z

@toobaz : I know it's been some time, but according to GitHub, the only thing you absolutely need to update is the whatsnew. Do you still want to get this merged?

toobaz · 2017-07-13T21:15:57Z

@toobaz : I know it's been some time, but according to GitHub, the only thing you absolutely need to update is the whatsnew.

Yes in order to solve conflicts, but then unfortunately tests will fail (see my previous comment). So I see it as pointless, but if instead there's some utility, I can do it easily.

Do you still want to get this merged?

Well, in principle yes, but no problem for me if you want to close, I can open another PR in the future. s.loc._getitem_iterable(.) must be fixed first, and I fear it will take more time than it took me to write this patch.

gfyoung · 2017-07-13T21:20:57Z

@toobaz : How about this: if you could patch the merge conflict (it's pretty simple to do as you said), and we can let Travis run the tests again just to see where things stand now.

If they fail, I can close this for you. However, if that happens, if you could add a comment regarding s.loc._getitem_iterable(...) in the issue that you posted (i.e. #15452) that this needs to be patched first, that would be useful for future reference. Or, if you think it should standalone as a separate issue, feel free to file a separate that references your initial issue.

How does that sound?

closes #GH15452

toobaz · 2017-07-14T07:16:06Z

: How about this: if you could patch the merge conflict (it's pretty simple to do as you said), and we can let Travis run the tests again just to see where things stand now.

Sure, no problem! Done.

if you could add a comment regarding s.loc._getitem_iterable(...) in the issue that you posted (i.e. #15452) that this needs to be patched first, that would be useful for future reference. Or, if you think it should standalone as a separate issue, feel free to file a separate that references your initial issue.

Done: #16920

gfyoung · 2017-07-14T07:22:17Z

Alright: Two test failures due to the fact that you're using outdated testing functions (pytest.raises instead of tm.assertRaises). However, you do have one test failure as you predicted because of that issue re: the _get_item issue.

I will close this PR given your previous comments. As you said previously, #16920 will need to be patched before fixing #15452. Feel free to take a stab at the former if you have the time.

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Mar 8, 2017

jreback reviewed Mar 10, 2017

View reviewed changes

toobaz force-pushed the clean_multiindex_keys branch from 2fb090f to b250e91 Compare March 10, 2017 17:54

jreback reviewed Mar 10, 2017

View reviewed changes

toobaz force-pushed the clean_multiindex_keys branch from b250e91 to b4a1c5b Compare March 11, 2017 01:36

jreback reviewed Mar 11, 2017

View reviewed changes

jreback added this to the 0.20.0 milestone Mar 20, 2017

toobaz mentioned this pull request Mar 20, 2017

Proposal to change behaviour with .loc and missing keys #15747

Closed

jreback removed this from the 0.20.0 milestone Mar 23, 2017

toobaz added 4 commits July 14, 2017 08:52

CLN: clean up type checking for MultiIndex indexing

95d1c6b

closes #GH15452

TST: tests for indexing MultiIndex with generator

d2e767b

TST: tests for setitem with generator indexer

0213881

TST: Fix/add tests with MultiIndex and no labels found

03cec75

toobaz force-pushed the clean_multiindex_keys branch from b4a1c5b to 03cec75 Compare July 14, 2017 06:54

toobaz mentioned this pull request Jul 14, 2017

Regression in s.loc._getitem_iterable(...) #16920

Closed

gfyoung closed this Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean multiindex keys #15615

Clean multiindex keys #15615

toobaz commented Mar 8, 2017

codecov-io commented Mar 8, 2017 •

edited by codecov bot

Loading

jreback Mar 10, 2017

toobaz Mar 10, 2017

jreback Mar 10, 2017

toobaz Mar 10, 2017

toobaz commented Mar 10, 2017

jreback Mar 10, 2017

toobaz Mar 11, 2017

jreback Mar 11, 2017

toobaz Mar 11, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented May 13, 2017

toobaz commented May 13, 2017

gfyoung commented Jul 13, 2017 •

edited

Loading

toobaz commented Jul 13, 2017

gfyoung commented Jul 13, 2017 •

edited

Loading

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

Clean multiindex keys #15615

Clean multiindex keys #15615

Conversation

toobaz commented Mar 8, 2017

codecov-io commented Mar 8, 2017 • edited by codecov bot Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz commented Mar 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

toobaz commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented Mar 20, 2017

jreback commented May 13, 2017

toobaz commented May 13, 2017

gfyoung commented Jul 13, 2017 • edited Loading

toobaz commented Jul 13, 2017

gfyoung commented Jul 13, 2017 • edited Loading

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

codecov-io commented Mar 8, 2017 •

edited by codecov bot

Loading

gfyoung commented Jul 13, 2017 •

edited

Loading

gfyoung commented Jul 13, 2017 •

edited

Loading