Skip to content

BUG: DataFrame.at allows adding new rows to a DataFrame #48323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhshadrach opened this issue Aug 31, 2022 · 8 comments
Open

BUG: DataFrame.at allows adding new rows to a DataFrame #48323

rhshadrach opened this issue Aug 31, 2022 · 8 comments
Assignees
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@rhshadrach
Copy link
Member

df = pd.DataFrame({'a': [1, 2]})
df.at[5, 'a'] = 6
print(df)

#      a
# 0  1.0
# 1  2.0
# 5  6.0

The docstring of DataFrame.at says this should raise a KeyError. This method shouldn't allow the addition of rows because it is ill-performant and makes .at both an indexing and reshaping method.

This may be related to #48224, but seems to me to be a separate issue (at least, from a user perspective).

@rhshadrach rhshadrach added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 31, 2022
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Aug 31, 2022
@srotondo
Copy link
Contributor

take

@srotondo
Copy link
Contributor

srotondo commented Sep 2, 2022

@rhshadrach DataFrame.loc also seems to exhibit the same behavior here, allowing the addition of a row. Unless I am very much mistaken, that's probably another related bug as well, right? I think they are probably caused by the same thing, so fixing one should fix the other, but I want to be sure the behavior in .loc is in fact a bug before I start making changes.

@jorisvandenbossche
Copy link
Member

The docstring of DataFrame.at says this should raise a KeyError.

On the other hand, the user guide contradicts this .. (https://pandas.pydata.org/docs/user_guide/indexing.html#fast-scalar-value-getting-and-setting):

at may enlarge the object in-place as above if the indexer is missing.

In addition, it also seems we basically have been allowing this for a long time (or forever?)


Personally, I would prefer limiting at to just accessing (or setting) existing scalar values.

@jorisvandenbossche
Copy link
Member

The KeyError mention in the docstring was added in #20290, but there was no question about that at the time.
That PR does lead me to a similar issue, though: #46722, where it seems was decided that the docstring is incorrect (it also mentions it not tested, though, not sure that is still the case)

@rhshadrach
Copy link
Member Author

Thanks @jorisvandenbossche - from your comments I think this should be deprecated rather than considered a bugfix.

@srotondo - I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas and removed Bug labels Sep 7, 2022
@srotondo
Copy link
Contributor

srotondo commented Sep 7, 2022

@srotondo - I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

@rhshadrach After reading over the code some more, I see that it is very well established that .loc should be able to add new locations in a DataFrame, so I don't think this is an issue.

I'm currently working on a PR that will enforce this behavior in .at, but I have to fix some tests that are failing due to changed behavior first.

@jorisvandenbossche
Copy link
Member

I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

Yes, we certainly have intentionally allowed .loc to "enlarge" while setting, so if we want to change this, that would require a larger discussion.

(personally, I think it would be nice to separate the concepts of setting existing values and enlarging / adding labels, but adding yet another indexer that would allow that is also not very compelling)

@Faraz-Tab
Copy link

There is no activity here for the last 2 months, let's close this issue for now and re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants