-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Method dropna does not work on SparseDataFrames #21172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you simplify your example? Ideally focus on one copy / paste-able example that outlines the issue and expected result. Be sure to check out the contributing guide for proper bug reporting: https://pandas.pydata.org/pandas-docs/stable/contributing.html#bug-reports-and-enhancement-requests |
The most prominent sample is this one import pandas as pd
print(pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1]}).dropna(axis=1, inplace=False, how='all')) outputs
That means that after dropping fields having NaNs a field having all |
Is this new in 0.23? |
Exactly I tried it in a new virtualenv and pandas v 0.23.0 has this issue and v 0.22.0 does not. |
Thanks for the example / report. Investigation / PRs are always welcome! |
To conclude. I've isolated this issue into this code. A mask - SparseArray of the fields to which should be preserved is created. The mask is called import pandas as pd
sa = pd.SparseArray([float('nan'), float('nan'), 1, 0, 0, 2, 0, 0, 0, 3, 0, 0])
sa = sa.nonzero()
print(sa) outputs To resolve this - one could use |
Function
dropna
may return wrong result onSparseDataFrame
. The following codeoutputs
Problem description
dropna
method behaves differently forSparseDataFrame
s and dense ones. Also it may happen that it does not dropnan
columns at all (see the last examples in the first batch). The correct behaviour is in the second batch of commands.Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: