-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix combine_first converting timestamp to int #35514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you post an example of what the change to a user would look like (e.g. what the 'regression' )is
@jreback master is inconsistent when the # master
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]])
df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2])
expected = pd.Series([True, True, False], name=2)
result1 = df1.combine_first(df2)[2]
result2 = df2.combine_first(df1)[2]
print(expected.dtype, result1.dtype, result2.dtype)
>>>bool bool object # master
import pandas as pd
df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")
df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64")
res1 = df1.combine_first(df2)
res2 = df2.combine_first(df1)
print(res1["a"].dtype, res2["a"].dtype)
>>>int64 float64 the current fix makes the function consistent but does make the result different from expected, in the first case now both ways we get a series of type # fix
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]])
df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2])
expected = pd.Series([True, True, False], name=2)
result1 = df1.combine_first(df2)[2]
result2 = df2.combine_first(df1)[2]
print(expected.dtype, result1.dtype, result2.dtype)
>>>bool object object # fix
import pandas as pd
df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64")
df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64")
res1 = df1.combine_first(df2)
res2 = df2.combine_first(df1)
print(res1["a"].dtype, res2["a"].dtype)
>>>float64 float64 let me know your thought on this. |
@nixphix this looks pretty good. can you merge master and we'll see if the CI passes |
afd4a70
to
457a0ab
Compare
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. if you can parameterize those tests and ping on green.
result = df1.combine_first(df2)[2] | ||
expected = Series([True, True, False], name=2) | ||
tm.assert_series_equal(result, expected) | ||
expected1 = pd.Series([True, True, False], name=2, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test (e.g. put the df1 and df2 in the parameter along with the expected)
@@ -339,9 +348,14 @@ def test_combine_first_int(self): | |||
df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixphix lgtm, can you merge master and address the test comments, ping on green.
res = df1.combine_first(df2) | ||
tm.assert_frame_equal(res, df1) | ||
assert res["a"].dtype == "int64" | ||
exp1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and make a seaparte test
this PR is prob ok, just needs a rebase and updating for comments. |
], | ||
) | ||
def test_combine_first_timestamp_bug(val1, val2, nulls_fixture): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here "# GH#35514"
Closing in favor of #38145 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This fix introduced two regression, but it appears like the fix only made the API consistent. Previously the failing regressions where inconsistent say
df1.combine_first(df2)
would not return the same result asdf2.combine_first(df1)
for the failing cases, more on these in the code comments.Let me know if there is a better way to handle this.