-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.combine_first with datetime-tz dtype (#21469) #21544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No solution yet for where, and combine_first might require cleanup.
DataFrame.combine_first was fixed in pandas-dev#13970. Fix Series using same logic. Remove common._where_compat since it was only referred to by Series.combine_first, name was confusing, and not necessary to the new solution.
Where is broken for DT TZ on both DataFrame and Series and uses a separate code path. Should open a new issue for this.
if not is_dtype_equal(other.dtype, new_dtype): | ||
other = other.astype(new_dtype) | ||
|
||
if needs_i8_conversion(this.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use the in-built
other.where(isna(this), this)
which already knows how to handle the dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback In theory that should work, but where
is broken on datetime tz values on both Series and DataFrame, see #21546. I considered trying to fix where
for this PR, but since the current combine_first
impl doesn't point to where
and where
is a problem on both Series and DataFrame I thought better to separate the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then convert using pd.Index
and use .where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really would prefer to fix #21546 first in any event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm taking a look at #21546. Agree it makes sense ultimately to fix the underlying where
issue and keep the implementation of combine_first simpler as a result. I just posted a comment over there with some questions about the contract for the dimension of the return value of Block.values
and Block.get_values
that appears to be driving some of the issues over there (DatetimeTZBlock internally is using values with ndim==1
even for DataFrame blocks, which have ndim==2
).
git diff upstream/master -u -- "*.py" | flake8 --diff