Skip to content

BUG: Series.combine_first with datetime-tz dtype (#21469) #21544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

Liam3851
Copy link
Contributor

@Liam3851 Liam3851 commented Jun 19, 2018

Liam3851 added 4 commits June 14, 2018 15:12
No solution yet for where, and combine_first might require cleanup.
DataFrame.combine_first was fixed in pandas-dev#13970. Fix Series using same
logic. Remove common._where_compat since it was only referred to by
Series.combine_first, name was confusing, and not necessary to the new
solution.
Where is broken for DT TZ on both DataFrame and Series and uses a
separate code path. Should open a new issue for this.
if not is_dtype_equal(other.dtype, new_dtype):
other = other.astype(new_dtype)

if needs_i8_conversion(this.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use the in-built

other.where(isna(this), this) which already knows how to handle the dtypes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback In theory that should work, but where is broken on datetime tz values on both Series and DataFrame, see #21546. I considered trying to fix where for this PR, but since the current combine_first impl doesn't point to where and where is a problem on both Series and DataFrame I thought better to separate the issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then convert using pd.Index and use .where

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really would prefer to fix #21546 first in any event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I'm taking a look at #21546. Agree it makes sense ultimately to fix the underlying where issue and keep the implementation of combine_first simpler as a result. I just posted a comment over there with some questions about the contract for the dimension of the return value of Block.values and Block.get_values that appears to be driving some of the issues over there (DatetimeTZBlock internally is using values with ndim==1 even for DataFrame blocks, which have ndim==2).

@jreback jreback added Bug Timezones Timezone data dtype labels Jun 20, 2018
@Liam3851 Liam3851 closed this Jun 27, 2018
@Liam3851
Copy link
Contributor Author

Closing this PR. Will open a new one that attempts to fix #21546 in addition to #21469,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series (but not DataFrame) combine_first() loses timezone information
2 participants