-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Mode() not compatible with fillna() #9750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have found that if I want to fill NaN with the mode, I need to do this: |
In [16]: df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
In [17]: df.mode()
Out[17]:
A
0 1
1 2 |
Hmm. If I were designing |
Thanks for looking into this and also for the explanation. As a user I would like a parameter that controls this behavior, where the default is to return a series (i.e. choose the first mode if many). Whatever you decide, may I suggest that at least the clarification/example given by TomAugspurger is added to the documentation (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.mode.html)? I did read that page before creating this issue, and the reason why a dataframe is returned was not clear to me... |
@alfonsomhc If you'd like to put together a PR with a documentation patch, it would be gratefully accepted. |
I see that the page I referred to is generated by the documentation in file pandas/core/frame.py |
I didnt really know how to do the pull request. Hopefully I didnt break anything! |
And now suddenly the issue is closed? Hopefully somebody can verify what I did. In case it wasnt clear enough, it's the first time I contribute to an open source project... |
Closing as it looks like the proper documentation was added. |
I made an toy dataframe:
df = pandas.DataFrame([[1, 1, 1],[2, 1, 1],[2, 1, 1],[numpy.nan, numpy.nan, numpy.nan]], columns=["a","b","c"])
I try different methods to fill missing values. These work as expected:
df.fillna(df.mean())
df.fillna(df.median())
But this doesnt work:
df.fillna(df.mode())
Inspecting the output from df.mode() I see it has different format than df.mean() and df.median(). As I user I would expect the same behavior for these functions, and be able to fill missing values as described.
Using Pandas 0.15.2
The text was updated successfully, but these errors were encountered: