Skip to content

BUG: Multiindexed series .at fix #32520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 25, 2020

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Mar 7, 2020

@rhshadrach rhshadrach changed the title [BUG] Multiindex .at fix [BUG] Multiindexed series .at fix Mar 7, 2020
@rhshadrach

This comment has been minimized.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Bug MultiIndex labels Mar 11, 2020
def test_multiindex_series_loc_set():
series = pd.Series([1, 2], index=[[1, 2], [3, 4]])
series.loc[1, 3] = 3
assert series.loc[1, 3] == 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think the loc tests are needed, unless that behavior is being fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @jreback requested loc to be added because the rest of the tests found in test_scalar test both at and loc.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests comments, ping on green.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in 1.1, bugfixes for indexing

@jreback jreback added this to the 1.1 milestone Mar 14, 2020
@rhshadrach
Copy link
Member Author

@jreback ping

@rhshadrach

This comment has been minimized.

@rhshadrach

This comment has been minimized.

@jbrockmendel
Copy link
Member

@rhshadrach can you rebase and ill take another look

@pep8speaks
Copy link

pep8speaks commented Mar 18, 2020

Hello @rhshadrach! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-18 20:00:11 UTC

@rhshadrach
Copy link
Member Author

@jbrockmendel I've rebased and all checks pass.

@jreback
Copy link
Contributor

jreback commented Mar 19, 2020

lgtm. can you merge master. cc @jbrockmendel merge when happy.

@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

lgtm. cc @jbrockmendel merge when ready.

@jbrockmendel
Copy link
Member

A few comments, generally looks very good

@rhshadrach
Copy link
Member Author

@jbrockmendel Thanks for the comments. I've addressed all of them. I believe failing checks are unrelated, but I'm having a hard time understanding the error messages.

@jbrockmendel
Copy link
Member

LGTM.

Yes, the azure failure is unrelated. To be on the safe side we usually try to sort these out before merging anything new.

@jorisvandenbossche jorisvandenbossche changed the title [BUG] Multiindexed series .at fix BUG: Multiindexed series .at fix Mar 30, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just opened #33153 for a general API question that also impacts this PR.

So the fix to allow .at indexing for a Series with MultiIndex is certainly a welcome fix, but I only question that it should also work if not all levels of the MultiIndex are specified (a behaviour that this PR also enables).

assert series.index.nlevels == 2
assert series.at[1, 3] == 1
assert series.loc[1, 3] == 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for not providing all MultiIndex levels? (so eg series.at[1]).
And also a test with a duplicated label?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe also an equivalent test for a DataFrame (it seems to work, but would be good to ensure we also have an explicit test for that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I have it mostly done, but ran into a question. For the series

series = pd.Series([1, 2, 3], index=[[1, 2, 2], [3, 4, 4]])

i.e.

1  3    1
2  4    2
   4    3
dtype: int64

what should be the output of series.at[1, 3]? I was expecting to just get the scalar 1, but currently I'm getting

1  3    1
dtype: int64

After thinking about this - perhaps what we have currently is the desired output. My reason for this is that the shape doesn't change depending on the input - (1, 3) vs (2, 4).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that should be a scalar (but that is also the reason that I asked to test it, to be sure this is covered ànd to discuss the semantics we want)

@jorisvandenbossche
Copy link
Member

Seems to me #26989 does not address using .at when not all levels are specified. Perhaps you meant another issue?

It was not described in that issue, but this PR is still enabling it:

In [5]: s = pd.Series(range(4), index=pd.MultiIndex.from_product([['a', 'b'], [1, 2]]))    

In [6]: s  
Out[6]: 
a  1    0
   2    1
b  1    2
   2    3
dtype: int64

In [7]: s.at['a']    
Out[7]: 
1    0
2    1
dtype: int64

This fails on 0.25.3 and 1.0.3

@jorisvandenbossche
Copy link
Member

Ah, but it seems this also already works on master, so it has been "accidentally" enabled due to some other refactoring, I suppose. So OK, indeed not directly related to your PR.

@rhshadrach
Copy link
Member Author

Ah, but it seems this also already works on master, so it has been "accidentally" enabled due to some other refactoring, I suppose. So OK, indeed not directly related to your PR.

With this, I think we should leave as-is in regards to non-scalar results. These can be fixed or documented and then tested for after discussion in #33153. I added the tests for DataFrame though. Note that DataFrame.loc fails with an Index of tuples, so I did not add this test.

@rhshadrach
Copy link
Member Author

@jorisvandenbossche changes made. Let me know if you agree with my assessment above.

@jreback
Copy link
Contributor

jreback commented Apr 6, 2020

@rhshadrach can you merge master. @jorisvandenbossche ok here?

@rhshadrach
Copy link
Member Author

@jorisvandenbossche Let me know if there are any more changes you'd like.

@jbrockmendel
Copy link
Member

@rhshadrach can you rebase

I think this may make #32855 unnecessary

@rhshadrach rhshadrach mentioned this pull request Apr 22, 2020
5 tasks
@rhshadrach
Copy link
Member Author

@jbrockmendel - done. Seems to me the only intersection with the PR you mentioned is the one test dealing with Series there. This PR only deals with Series, where as most of that is dealing with DataFrames.

@rhshadrach
Copy link
Member Author

rhshadrach commented May 4, 2020

@jorisvandenbossche @jreback I think this PR is all set.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. a doc issue, ping when green.

@rhshadrach
Copy link
Member Author

@jreback Note added back, green

series = df["a"]
assert series.index.nlevels == 2
series.at[1, 3] = 5
assert series.at[1, 3] == 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the behavior now on master? IOW is this an api change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current behavior on master is to raise ValueError for any of the at lines, behavior with the loc lines is the same as master. I would describe this as a bugfix and not an api change.

@jreback jreback merged commit 55e8891 into pandas-dev:master May 25, 2020
@jreback
Copy link
Contributor

jreback commented May 25, 2020

thanks @rhshadrach

@rhshadrach rhshadrach deleted the multiindex_at_fix branch July 11, 2020 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series with MultiIndex "at" function raises "TypeError"
5 participants