BUG: fix combine_first converting timestamp to int #35514

nixphix · 2020-08-02T06:35:41Z

closes combine_first returns unexpected results for timestamp dataframes #28481
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This fix introduced two regression, but it appears like the fix only made the API consistent. Previously the failing regressions where inconsistent say df1.combine_first(df2) would not return the same result as df2.combine_first(df1) for the failing cases, more on these in the code comments.

Let me know if there is a better way to handle this.

jreback

can you post an example of what the change to a user would look like (e.g. what the 'regression' )is

pandas/tests/frame/methods/test_combine_first.py

nixphix · 2020-08-06T02:48:25Z

can you post an example of what the change to a user would look like (e.g. what the 'regression' )is

@jreback master is inconsistent when the other and self df's are swapped this is not captured by current test cases

# master
import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]])
df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2])

expected = pd.Series([True, True, False], name=2)
result1 = df1.combine_first(df2)[2]
result2 = df2.combine_first(df1)[2]

print(expected.dtype, result1.dtype, result2.dtype)
>>>bool bool object

# master
import pandas as pd

df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")
df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64")

res1 = df1.combine_first(df2)
res2 = df2.combine_first(df1)

print(res1["a"].dtype, res2["a"].dtype)
>>>int64 float64

the current fix makes the function consistent but does make the result different from expected, in the first case now both ways we get a series of type object and the second case returns float64 either way.

# fix
import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]])
df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2])

expected = pd.Series([True, True, False], name=2)
result1 = df1.combine_first(df2)[2]
result2 = df2.combine_first(df1)[2]

print(expected.dtype, result1.dtype, result2.dtype)
>>>bool object object

# fix
import pandas as pd

df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")
df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64")

res1 = df1.combine_first(df2)
res2 = df2.combine_first(df1)

print(res1["a"].dtype, res2["a"].dtype)
>>>float64 float64

let me know your thought on this.

jbrockmendel · 2020-10-01T22:27:00Z

@nixphix this looks pretty good. can you merge master and we'll see if the CI passes

pandas/tests/frame/methods/test_combine_first.py

nixphix · 2020-10-04T11:40:24Z

@jreback I have made the code changes but the ci is failing on an unrelated test case test_2d_to_1d_assignment_raises. Believe this is caused by breaking changes in dependency or the issue is in the master.

pandas/tests/frame/methods/test_combine_first.py

github-actions · 2020-11-07T00:11:20Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

jreback

looks good. if you can parameterize those tests and ping on green.

jreback · 2020-11-08T03:04:39Z

pandas/tests/frame/methods/test_combine_first.py

-        result = df1.combine_first(df2)[2]
-        expected = Series([True, True, False], name=2)
-        tm.assert_series_equal(result, expected)
+        expected1 = pd.Series([True, True, False], name=2, dtype=object)


can you parameterize this test (e.g. put the df1 and df2 in the parameter along with the expected)

jreback · 2020-11-08T03:04:52Z

pandas/tests/frame/methods/test_combine_first.py

@@ -339,9 +348,14 @@ def test_combine_first_int(self):
        df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")


can you parameterize this as well

jreback

@nixphix lgtm, can you merge master and address the test comments, ping on green.

jreback · 2020-11-18T18:22:56Z

pandas/tests/frame/methods/test_combine_first.py

-        res = df1.combine_first(df2)
-        tm.assert_frame_equal(res, df1)
-        assert res["a"].dtype == "int64"
+        exp1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="float64")


and make a seaparte test

jreback · 2020-11-25T21:40:41Z

this PR is prob ok, just needs a rebase and updating for comments.

jbrockmendel · 2020-11-25T22:17:17Z

pandas/tests/frame/methods/test_combine_first.py

+    ],
+)
+def test_combine_first_timestamp_bug(val1, val2, nulls_fixture):
+


can you add a comment here "# GH#35514"

arw2019 · 2020-11-29T04:31:39Z

Closing in favor of #38145

BUG: fix combine_first converting timestamp to int (pandas-dev#28481)

8a4181d

nixphix marked this pull request as ready for review August 2, 2020 06:54

Add whats new entry

0b938f6

jreback requested changes Aug 5, 2020

View reviewed changes

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

Uncomment failing test cases

9f841c9

simonjayhawkins mentioned this pull request Sep 15, 2020

CI: Add stale PR action #36336

Merged

Merge branch 'master' into fix/combine-first

115667e

simonjayhawkins added the Needs Review label Sep 15, 2020

nixphix added 2 commits October 2, 2020 18:15

Merge branch 'master' into fix/combine-first

a044c2d

Fix failing test cases

457a0ab

nixphix force-pushed the fix/combine-first branch from afd4a70 to 457a0ab Compare October 2, 2020 13:28

jreback requested changes Oct 2, 2020

View reviewed changes

pandas/tests/frame/methods/test_combine_first.py Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

nixphix added 3 commits October 4, 2020 13:55

Resolve comments

8178c2e

Merge branch 'master' into fix/combine-first

5f22f4d

Black format

28b61c3

Merge branch 'master' into fix/combine-first

62997c4

nixphix requested a review from jreback October 6, 2020 00:23

jreback requested changes Oct 6, 2020

View reviewed changes

pandas/tests/frame/methods/test_combine_first.py Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Outdated Show resolved Hide resolved

pandas/tests/frame/methods/test_combine_first.py Show resolved Hide resolved

jreback added this to the 1.2 milestone Oct 6, 2020

jreback added Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Review labels Oct 6, 2020

Split test case

299ff04

nixphix requested a review from jreback October 7, 2020 17:41

github-actions bot added the Stale label Nov 7, 2020

jreback requested changes Nov 8, 2020

View reviewed changes

jreback requested changes Nov 18, 2020

View reviewed changes

jreback removed this from the 1.2 milestone Nov 18, 2020

jbrockmendel reviewed Nov 25, 2020

View reviewed changes

arw2019 mentioned this pull request Nov 29, 2020

BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

Merged

5 tasks

arw2019 closed this Nov 29, 2020

		@@ -339,9 +348,14 @@ def test_combine_first_int(self):
		df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")

Uh oh!

BUG: fix combine_first converting timestamp to int #35514

BUG: fix combine_first converting timestamp to int #35514

Uh oh!

Conversation

nixphix commented Aug 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nixphix commented Aug 6, 2020

Uh oh!

jbrockmendel commented Oct 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nixphix commented Oct 4, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jreback Nov 8, 2020

Choose a reason for hiding this comment

Uh oh!

jreback Nov 8, 2020

Choose a reason for hiding this comment

Uh oh!

jreback left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 25, 2020

Uh oh!

jbrockmendel Nov 25, 2020

Choose a reason for hiding this comment

Uh oh!

arw2019 commented Nov 29, 2020

Uh oh!

Uh oh!

nixphix commented Aug 2, 2020 •

edited

Loading

jreback left a comment •

edited

Loading