Skip to content

BUG: handle nan values in DataFrame.update when overwrite=False (#15593) #16430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 24, 2017

Conversation

pcluo
Copy link
Contributor

@pcluo pcluo commented May 22, 2017

BUG: handle nan values in DataFrame.update when overwrite=False (#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2

@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 22, 2017
@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 22, 2017
@@ -37,6 +37,7 @@ Bug Fixes
~~~~~~~~~

- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
- Bug in using ``DataFrame.update`` when overwrite=False and the second value is nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a ref to the issue ath the end here

(:issue:`15593`)

df1 = DataFrame({'A': [1, None, 3], 'B': date_range('2000', periods=3)})
df2 = DataFrame({'A': [None, 2, 3]})
df1.update(df2, overwrite=False)
# it works!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make an exected dataframe and compare?

expected = df1 = DataFrame({'A': [1.0, 2, 3], 'B': date_range('2000', periods=3)})
tm.assert_frame_equal(df1, expected)

@pcluo
Copy link
Contributor Author

pcluo commented May 22, 2017 via email

df2 = DataFrame({'A': [None, 2, 3]})
df1.update(df2, overwrite=False)
# it works!
tm.assert_frame_equal(df1, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a second test with

df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)})
df1.update(df2, overwrite=False)

just to make sure we cover the original issue? You can do it in this same test, and use the same expected


def test_update_nan(self):
# #15593 #15617
expected = df1 = DataFrame({'A': [1.0, 2, 3], 'B': date_range('2000', periods=3)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do like this

df1 = DataFrame(....)
df2 = DataFrame(....)

expected = df1
result = df1.update(....)

tm.assert_frame_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think update is inplace? So expected = df1.copy()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it returns a new object (but has inplace arg)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, you are right @TomAugspurger . Hmm that is pretty bogus it should have a kwarg for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -37,6 +37,7 @@ Bug Fixes
~~~~~~~~~

- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
- Bug in using ``DataFrame.update`` when overwrite=False and the second value is nan (:issue:`15593`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the 2nd issue as well.

Bug in DataFrame.update() with overwrite=False and NaN values

@TomAugspurger
Copy link
Contributor

@pcluo can you update with the requested changes? Thanks!

@pcluo
Copy link
Contributor Author

pcluo commented May 24, 2017

@TomAugspurger yes, plz check.

tm.assert_frame_equal(df1, expected)

# test 2
expected = df1 = DataFrame({'A': [1, None, 3], 'B': date_range('2000', periods=3)})
Copy link
Contributor

@TomAugspurger TomAugspurger May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think this got lost in the conversation, but can you write this as:

df1 = DataFrame({'A': [1, None, 3], 'B': date_range('2000', periods=3)})
df2 = DataFrame({'A': [None, 2, 3]})
expected = DataFrame({'A': [1, 2, 3], 'B': date_range('2000', periods=3)})

The one on line 773 can be a bit shorter, just

df1 = DataFrame({'A': [1.0, 2, 3], 'B': date_range('2000', periods=3)})
expected = df1.copy()

basically, since .update works inplace, the test will always be true no matter what happens. expected and df1 are reference to the same object. It's like

In [2]: a = b = {'x': 1}

In [3]: a['y'] = 2

In [4]: a == b
Out[4]: True

Which "shouldn't" be true (but is, since a and b are both references to the same object). Let me know if that makes sense.

@pcluo pcluo force-pushed the master branch 2 times, most recently from d8bde84 to be7a230 Compare May 24, 2017 06:20
@pcluo pcluo closed this May 24, 2017
@pcluo pcluo reopened this May 24, 2017
@pcluo
Copy link
Contributor Author

pcluo commented May 24, 2017

@TomAugspurger just made another attempt at this. Sorry the last conversation got lost when i squashed commits. I changed [1, 2, 3] in test 2 to [1.0, 2, 3] otherwise the dtypes are not equal, float64 vs int64.

@jreback
Copy link
Contributor

jreback commented May 24, 2017

can you rebase on master

@TomAugspurger
Copy link
Contributor

I changed [1, 2, 3] in test 2 to [1.0, 2, 3] otherwise the dtypes are not equal, float64 vs int64.

Yeah, that sounds right.

If you rebase and force push, all the tests should pass.

git fetch upstream
git rebase upstream/master
git push -f origin master

FYI, typically you want to checkout a new branch for features / bug fixes. http://pandas.pydata.org/pandas-docs/stable/contributing.html#creating-a-branch Not a huge deal though.

…as-dev#15593)

add nan test for DataFrame.update

update whatsnew v0.20.2
@pcluo
Copy link
Contributor Author

pcluo commented May 24, 2017

it seems all tests passed now. thx @TomAugspurger and will checkout a new branch next time.

@TomAugspurger
Copy link
Contributor

Looks like Travis isn't picking up your commits for some reason. I'll ask their support.

@jreback jreback merged commit 85080aa into pandas-dev:master May 24, 2017
@jreback
Copy link
Contributor

jreback commented May 24, 2017

thanks!

@TomAugspurger
Copy link
Contributor

FYI @pcluo TravisCI support thought that if you sign into travis with your github account then things should be good.

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request May 29, 2017
TomAugspurger pushed a commit that referenced this pull request May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

handle nan values in DataFrame.update when overwrite=False
3 participants