Skip to content

BUG: fillna overwriting other missing vals #40509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mzeitlin11
Copy link
Member

@mzeitlin11 mzeitlin11 added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 23, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure i buy that this is a bug at all.

In [149]: series.isna()                                                                                                                                               
Out[149]: 
1    False
2     True
3     True
4    False
dtype: bool

Trying to use mixed null values in an object series is just not supported.

@mzeitlin11
Copy link
Member Author

i am not sure i buy that this is a bug at all.

In [149]: series.isna()                                                                                                                                               
Out[149]: 
1    False
2     True
3     True
4    False
dtype: bool

Trying to use mixed null values in an object series is just not supported.

I think the confusion from #40498 is not that isna(None) = True, but that the value at index 2 is modified even though the argument {'3':0} should ensure that only index 3 is modified.

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

i am not sure i buy that this is a bug at all.

In [149]: series.isna()                                                                                                                                               
Out[149]: 
1    False
2     True
3     True
4    False
dtype: bool

Trying to use mixed null values in an object series is just not supported.

I think the confusion from #40498 is not that isna(None) = True, but that the value at index 2 is modified even though the argument {'3':0} should ensure that only index 3 is modified.

I am ok with that, but your patch is not structured that way. it shouldn't be testing for nulls

@mzeitlin11
Copy link
Member Author

Sorry, can you please clarify what you mean by that?

What the patch should be doing is fixing the following: for the series ser = Series({'1': 1,'2': None,'3': np.nan, '4': 1}), when the fill argument '{'3':0}' hits the line I changed, it previously became [NaN, NaN, 0, NaN].

When this hits ObjectBlock.fillna, mask is determined from isna(ser), giving [False, True, True, False]. So then when putmask_inplace is called with this mask, None is replaced with np.nan since the mask is True at the position.

To avoid having to change the behavior of lower level fillna functionality, this patch changes the fill argument to instead become [NaN, None, 0, NaN], so that None just ends up being replaced by itself.

@@ -6524,6 +6524,11 @@ def fillna(
value, dtype_if_empty=object
)
value = value.reindex(self.index, copy=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are better off calling .align here. What you have on L6531 is too specific, fragile and trying to fix something after the fact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see what you mean. Not seeing a clear solution with .align either - will have to have some sort of ugly isna check at some point I think to support this. Will close for now and reopen if I can find a better way to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.fillna is altering None values of unspecified columns
2 participants