Skip to content

BUG: DataFrame.at must fail when adding more than one value #48362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Sep 2, 2022

Greetings from EuroScipy 2022 and thanks to @jorisvandenbossche for support.

The problem occurred because when adding a new column to a DataFrame using
DataFrame.at, DataFrame._set_value falls back to DataFrame.loc before the
InvalidIndexError is raised.

The proposed solution makes sure that both, the column index and the row index
are determined so that both have a chance to raise.

@johannes-mueller johannes-mueller force-pushed the 48224-frame-at-new-column-index-slice branch from 73123f9 to e8b61fc Compare September 2, 2022 10:55
@jorisvandenbossche jorisvandenbossche changed the title Fix 48224: DataFrame.at must fail when adding more than one value BUG: DataFrame.at must fail when adding more than one value Sep 2, 2022
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm in general. Could you check if this impacts #48323?

@jorisvandenbossche
Copy link
Member

if we consider this as a not-that-notable bug fix (although it did work "correctly" in the past, it was just not failing and doing too much for what we want at to do)

On a second thought, I was just checking if this was also working in the past, and it seems the test example here has "correctly" worked basically forever (I tested down to 0.18). So we should maybe consider deprecating this first?

The problem occurred because when adding a new column to a DataFrame using
DataFrame.at, DataFrame._set_value falls back to DataFrame.loc before the
InvalidIndexError is raised.

The proposed solution makes sure that both, the column index and the row index
are determined so that both have a chance to raise.
@johannes-mueller johannes-mueller force-pushed the 48224-frame-at-new-column-index-slice branch from e8b61fc to f7dad6d Compare September 2, 2022 19:08
@johannes-mueller
Copy link
Contributor Author

if we consider this as a not-that-notable bug fix (although it did work "correctly" in the past, it was just not failing and doing too much for what we want at to do)

On a second thought, I was just checking if this was also working in the past, and it seems the test example here has "correctly" worked basically forever (I tested down to 0.18). So we should maybe consider deprecating this first?

Since 0.23 there is the sentence Use at if you only need to get or set a single value in a DataFrame or Series. in the docs. So we could consider it deprecated since then.

But I wouldn't mind deprecating "officially".

@phofl
Copy link
Member

phofl commented Sep 2, 2022

I would disallow immediately. Using the same indexer again raises, so we would fix an inconsistency on top of that.

@johannes-mueller
Copy link
Contributor Author

lgtm in general. Could you check if this impacts #48323?

If we really decide that .at should never add new columns or rows to a DataFrame we probably should not merge this. In that case we must fix the user guide, though as @jorisvandenbossche showed, that it allows the enlargement by .at.

If we decide to merge this, I would suggest also fixing the docstring of .at. AFAICS it now never raises a KeyError because these are all caught in ._set_value which then falls back to .loc.

So decision needed.

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 2, 2022
@srotondo
Copy link
Contributor

@johannes-mueller In pandas/tests/indexing/test_at.py, there is a test named test_at_setitem_multiindex. During this test, the line df.at[0, "a"] = 10 successfully modifies two columns at once. I believe your code change here should prevent this from succeeding, right? I can tell from the code checks here and modified files that this test still succeeds and that you didn't change this test at all. Also, changing this line to x = df.at[0, "a"] and trying to only get this value will raise an error, so I believe causing this setting action to fail should be the correct course of action.

@srotondo
Copy link
Contributor

If we really decide that .at should never add new columns or rows to a DataFrame we probably should not merge this. In that case we must fix the user guide, though as @jorisvandenbossche showed, that it allows the enlargement by .at.

I am working on a PR that will do this exact thing, including modifying the user guide and documentation. I'll link it here once I'm done with it so we can decide what to do to fix this bug.

@srotondo
Copy link
Contributor

I've finally submitted my PR. You can find it at #48542.

@johannes-mueller
Copy link
Contributor Author

Superseded by #48542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: using at indexer reassign existing field error
5 participants