Skip to content

BUG: inplace parameter will not perform the operation in place on multi column-sliced DataFrame #34359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fluid-gun opened this issue May 25, 2020 · 5 comments

Comments

@fluid-gun
Copy link

fluid-gun commented May 25, 2020

Environment: Python 3.7.7 / NumPy 1.18.4 / Pandas 1.0.3

I have been testing the inplace=True parameters for a couple of functions, including DataFrame.fillna() and DataFrame.clip().
When I use the inplace parameter on a multi-column slicing of a DataFrame, it will not alter the original DataFrame.

For instance, using the simple DataFrame below,

df = pd.DataFrame({'A':[1,None,5],'B':[3,4,-1]})
print(df)
Out: 
     A  B
0  1.0  3
1  NaN  4
2  5.0 -1

the code

df.loc[:, ['A','B']].fillna(0, inplace=True)
# or df.loc[:, 'A':'B'].fillna(0, inplace=True)
# or df.iloc[:,[0,1]].fillna(0, inplace=True)
# or df.iloc[:,0:2].fillna(0, inplace=True)
print(df)

will produce

Out: 
     A  B
0  1.0  3
1  NaN  4
2  5.0 -1

as opposed to

Out: 
     A  B
0  1.0  3
1  0.0  4
2  5.0 -1

Strangely enough, the parameter works perfectly when I'm slicing out just a single column

df[:, 'A'].fillna(0, inplace=True)
print(df)
Out:
     A  B
0  1.0  3
1  0.0  4
2  5.0 -1

or when I'm using the position-based slicing method .iloc without specifying the first column

df.iloc[:,:2].fillna(0, inplace=True)
print(df)
Out:
     A  B
0  1.0  3
1  0.0  4
2  5.0 -1

Let me know if this is the accepted/expected behavior or if there is a workaround.
Thanks!

@fluid-gun fluid-gun changed the title BUG: inplace parameter will not perform the operation in place on column-wise label-sliced DataFrame BUG: inplace parameter will not perform the operation in place on multi column-sliced DataFrame May 25, 2020
@vampypandya
Copy link
Contributor

Please let us know all the version details as mentioned in the Github Bug documentation.

@jorisvandenbossche
Copy link
Member

@dgun-y Currently, selecting multiple columns of a dataframe returns a copy, and not a view (and hence, the inplace is working on a copy, and not filling the original dataframe)

The basic reason is that selecting multiple columns with a list is "fancy" or "advanced" indexing in numpy's terms, which makes a copy (on the other hand, "slicing" gives a view). Now, pandas has some additional complexity due to the BlockManager layout (https://uwekorn.com/2020/05/24/the-one-pandas-internal.html), which makes that even those numpy rules don't always apply for pandas.

Related issues: #33780, #16529

@fluid-gun
Copy link
Author

I suspected as much, thank you for the response.

Now, pandas has some additional complexity due to the BlockManager layout (https://uwekorn.com/2020/05/24/the-one-pandas-internal.html), which makes that even those numpy rules don't always apply for pandas.

By this, should I assume for the worst in the future, that even slices of single columns may exhibit similar behavior?

@jorisvandenbossche
Copy link
Member

No, slicing a single column will always be a view, AFAIK

@fluid-gun
Copy link
Author

No, slicing a single column will always be a view, AFAIK

Got it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants