-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Difference in drop_duplicates for numeric Series and DataFrames #14192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not directly sure where it has been changed/fixed, but I see a consistent return value in master (and 0.19.0rc1):
|
Would it actually be useful to be able to specify how to treat NaNs? |
For my specific application not really (I found this behaviour only because I forgot to use dropna to clean my data) , I am not sure how the other users need to handle it... |
I think #13514 solved this, it appears enough tests. but if someone can review, could always add some specific ones for this consistency. |
…andas-dev#14192) Add keep kwarg and new columns
Code Sample
Expected Output
Why the result is not consistent?
For a dataframe (first case), the NA values are considered equal (we get 2 unique values), while for a series (second case), the NA values are not (similarly to numpy.unique, we get 3 unique values)
Curiously, in the case of a mixed type column,
len(df[['c']].drop_duplicates() ) == len(df['c'].drop_duplicates())
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: