-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: FutureWarning issued and empty DataFrame returned where no numeric types to aggregate #43501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@simonjayhawkins It's not clear to me how you're suggesting this should be changed. Are you saying that the |
If it was raising I think that changing the exception type may not be an api change. changing the behavior is. so for 1.3.x we could either restore the DataError (or maybe change to TypeError) |
there's now a IIRC this was the best option available for ensuring internal consistency until the deprecation is enforced. e.g. we want |
Here's a table contrasting the behaviors using the examples @jbrockmendel listed between 1.2.5 and 1.3.2, and master:
With the fix in #43154 , we now have restored the 1.2.x behavior in the case of Results above produced with following program: long python code. click to expandimport pandas as pd
import numpy as np
frame = pd.DataFrame({"a": np.random.randint(0, 5, 50), "b": ["foo", "bar"] * 25})
print("Using mean")
print("calling mean directly")
try:
print(frame[["b"]].groupby(frame["a"]).mean())
except Exception as e:
print("exception raised ", str(e))
print("calling agg('mean')")
try:
print(frame[["b"]].groupby(frame["a"]).agg("mean"))
except Exception as e:
print("exception raised ", str(e))
print("calling apply(np.mean)")
try:
print(frame[["b"]].groupby(frame["a"]).apply(np.mean))
except Exception as e:
print("exception raised ", str(e))
print("calling sum directly")
try:
print(frame[["b"]].groupby(frame["a"]).sum())
except Exception as e:
print("exception raised ", str(e))
print("calling agg('sum')")
try:
print(frame[["b"]].groupby(frame["a"]).agg("sum"))
except Exception as e:
print("exception raised ", str(e))
print("calling apply(np.sum)")
try:
print(frame[["b"]].groupby(frame["a"]).apply(np.sum))
except Exception as e:
print("exception raised ", str(e)) |
The apply-that-is-an-agg part of this seems like something to ask @rhshadrach about. |
The relevant parts of the algorithm works in two steps:
To separate these, consider changing the example in the question to:
This raises on 1.2.x, but not on master. I believe raising in this case is a bug - it is inconsistent with other reductions (e.g. sum, min, max, prod). Making it so this op doesn't raise from 1.2 -> 1.3 seems appropriate to me. Now consider the example:
If we agree that not raising on an empty frame is correct, then it seems to me to be reasonable to have the same behavior when numeric_only subsets the frame to become empty. In other words, I think this shouldn't raise as well. Finally, back to the original example which is the same as above but numeric_only is left unspecified, it did raise in 1.2.x, it no longer raises but will raise a TypeError in the future. This is one case where I could see reverting to raising instead, but also think it would be okay to leave this behavior as is.
I think we currently treat |
@rshadrach so I've taken the table I made above and added a "Proposal" column, which I think is what you are suggesting above.
That would replicate the 1.2.5 behavior except where we were raising the |
@Dr-Irv - Is "Proposal" what we consider to be 2.0 behavior? In such a case, the top two lines of Proposal should raise a TypeError. For the third line, it should be whatever error numpy raises I think (assuming the first sentence of the last paragraph in #43501 (comment) is correct). |
@rhshadrach I think I was proposing 1.3.4 (or 1.4) behavior, based on when you wrote this:
|
@Dr-Irv - the paragraph you quoted was considering the example
For the top two lines, these will raise a TypeError in pandas 2.0, and so need to have the FutureWarning in 1.3.x/1.4. |
changing milestone to 1.3.5 |
There is also the issue of the FutureWarning being added in #43154 which was released in 1.3.3 violating the version policy https://pandas.pydata.org/pandas-docs/dev/development/policies.html
If we leave as is, the issue probably becomes irrelevant after 1.3.5 and can be closed as no action. If we want to revert to raising instead, it is only appropriate if done before 1.3.5? (assuming no further releases on 1.3.x) |
the future warning was shown because of a 1.2.x regression -1 on any further changes in 1.3.x |
sure. closing as no action. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
the code sample was raising
DataError: No numeric types to aggregate
in 1.2.5 but in 1.3.x this now issues a warning that it will raise in the future. (and also appears that the stacklevel is incorrect)on master
The change to a empty dataframe was in #41706 with the warning being added in #43154
cc @jbrockmendel @Dr-Irv
Expected Behavior
same as 1.2.5
Installed Versions
pd.show_versions()
The text was updated successfully, but these errors were encountered: