Skip to content

BUG: Scalar assignment to empty dataframe with loc. Closes GH41891 #41921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: Scalar assignment to empty dataframe with loc. Closes GH41891 #41921

wants to merge 1 commit into from

Conversation

klieret
Copy link
Contributor

@klieret klieret commented Jun 10, 2021

This weakens a check introduced in 7bbeb79 in PR #5227 addressing issue #5226 ("New appending behavior doesn't work on an empty DataFrame") which caused ValueErrors for df[:, 1] = 1 if df is an empty DataFrame. With this commit no error is thrown and the column is added to the dataframe (which remains empty).

See #41891 for why this makes sense.

This weakens a check introduced in 7bbeb79
which caused ValueErrors for df[:, 1] = 1 if df is an empty DataFrame.
With this commit no error is thrown and the column is added to the
dataframe (which remains empty).
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the actual problem you are trying to solve?

adding a column with a value which disappears (not to mention not setting a dtype) is nonsensical and harmful not helpful

the issue you pointed is completely unrelated as there IS a non empty index

@klieret
Copy link
Contributor Author

klieret commented Jun 10, 2021

Thanks for taking a look at this so quickly @jreback .

adding a column with a value which disappears (not to mention not setting a dtype) is nonsensical and harmful not helpful

My expectations for df.loc[:, 'a'] = 1 are that the following statements are true

  1. If it doesn't already exist, a column a is added to the frame
  2. column a has the value 1 for all rows

Thus, if df is empty, 2 simply becomes trivial and 1 should still be true. Maybe I'm missing something, but this feels perfectly natural to me.

This is also the behaviour of df['a'] = 1 (this syntax however will show the SettingWithCopyWarning if df is a copy/view).

not to mention not setting a dtype

In my branch, the dtype is set properly to int64:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame()
In [3]: df.loc[:, "a"] = 1
In [4]: df["a"]
Out[6]: Series([], Name: a, dtype: int64)

what is the actual problem you are trying to solve?

Given a copy, I want to be able to set a scalar value without explicitly having to check for emptiness every time. So basically I'm looking for something that replicates the behavior of df['a'] = 1 without complaining about copies (also see #41891).

As written above, from my perspective it also feels very unnatural and unnecessary to have a ValueError raised here. But perhaps I lack some context or there is a different way to look at this?

@phofl
Copy link
Member

phofl commented Jun 12, 2021

I think this is quite confusing.

import pandas as pd

df = pd.DataFrame()

df.loc[:, 'b'] = [1]

results in

   b
0  1

while your example would result in an empty Frame. Assigning values to empty DataFrames itself and resulting in empty dfs is not ideal imo

@klieret
Copy link
Contributor Author

klieret commented Jun 14, 2021

Your example is using the enlargement syntax, which is triggered whenever there's something list like on the right hand side (which I agree isn't entirely intuitive, especially because there's no index given for the new entry in your example).

But I think it's even more confusing that df[:, 'a'] = 1 is different from df['a'] = 1. And as argued above, it can be viewed as perfectly logical to have the outcome of this be a dataframe without rows (and what's more, this perspective on the logic of it is much more practical than raising a ValueError).

The new behavior proposed here would be fairly backwards compatible and I'm still convinced that it would be an improvement. Sorry for being stubborn.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 15, 2021
@mroeschke
Copy link
Member

Thanks for the PR, but appears from the review that this current approach does not have much support from other members of the core team. I think it's probably best if this is further discussed in the original issue first before proceeding with a PR. Closing.

@mroeschke mroeschke closed this Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants