-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Raising SettingWithCopyWarning in cross_validate #16191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is indeed the same underlying issue as was discussed in #8723. |
I would propose to set the attribute |
It's setting it to I reopened pandas-dev/pandas#19102 to have a potential discussion for a public way to do this. |
Actually, now I think about it, there is a better way: you can use See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html. The explanation of the |
Hello, I can take this issue. |
@Henley13 try making the comment "take" to assign yourself the issues |
I believe the canonical pandas way to deal with this is to make a copy: def processing(X):
return X.assign(new_age=X["age"].fillna(0).astype(int)) I see copying as making |
The canonical pandas way leads to many unnecessary copies as I noted in the issue @jorisvandenbossche linked pandas-dev/pandas#18799 |
For this specific example, I am thinking if import numpy as np
from sklearn.preprocessing import FunctionTransformer
from numpy.testing import assert_array_equal
transformer = FunctionTransformer(np.log1p)
X = np.array([[0, 1], [2, 3]])
X_trans = transformer.transform(X)
assert_array_equal(X, [[0, 1], [2, 3]]) But for the pandas example we have above: import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import FunctionTransformer
X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)
def processing(X):
X.loc[:, "new_age"] = X["age"].fillna(0).astype(int)
return X
X_trans = FunctionTransformer(processing).transform(X)
assert "new_age" in X.columns It changes the input. Are we okay with this type of usage? |
We have other estimators that can modify their input in place, if a copy
parameter is set to False. Is that what you're talking about?
|
I am thinking about how we document the I see two options to resolve this issue is:
|
The first thing you should understand is that SettingWithCopyWarning is a warning, and not an error. The real problem behind the warning is that it is generally difficult to predict whether a view or a copy is returned. In most cases, the warning was raised because you have chained two indexing operations together. The SettingWithCopyWarning was created to flag "chained assignment" operations. This is made easier to spot because you might be used [] (square brackets) twice, but the same would be true if you used other access methods such as .loc[] , .iloc[] and so on. Moreover, you can change the behaviour of SettingWithCopyWarning warning using pd.options.mode.chained_assignment with three option "None/raise"/"warn". |
Before submitting a bug, please make sure the issue hasn't been already
addressed by searching through the past issues.
Describe the bug
Using a
FunctionTransformer
which add a column to a dataframe will raise aSettingWithCopyWarning
when used in cross_validate`.This is highly linked to #8723 but I think that in this case, we should do something about it since:
FunctionTransformer
is stateless transformerSteps/Code to Reproduce
Expected Results
Getting some
SettingWithCopyWarning
The text was updated successfully, but these errors were encountered: