Skip to content

ENH: Support TZ Aware IntervalIndex #18558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 8, 2017

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Nov 29, 2017

Updates to Interval:

  • Disallowed Interval objects with mixed time zones
  • Minor docstring updates

Updates to IntervalIndex:

  • Disallowed IntervalIndex with mixed time zones
  • Fixed IntervalIndex.mid for tz aware
  • Fixed _get_next_label and _get_previous_label for tz aware
    • cascades to other methods
  • Cleaned up error message for mixed left/right types

msg = ("left and right must have the same time zone, got "
"'{left_tz}' and '{right_tz}'").format(
left_tz=left.tzinfo, right_tz=right.tzinfo)
raise ValueError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about pytz tzinfos? Eg US/Eastern is represented by a different tzinfo object for DST and non-DST

Copy link
Member Author

@jschendel jschendel Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does my switch to using get_timezone fully address this? The docstring seems to indicate so, but I'm not super familiar with this part of the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should do it, thanks.

@codecov
Copy link

codecov bot commented Nov 29, 2017

Codecov Report

Merging #18558 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18558      +/-   ##
==========================================
- Coverage   91.35%   91.34%   -0.01%     
==========================================
  Files         164      164              
  Lines       49802    49813      +11     
==========================================
+ Hits        45496    45502       +6     
- Misses       4306     4311       +5
Flag Coverage Δ
#multiple 89.14% <100%> (+0.01%) ⬆️
#single 40.8% <25%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 94.02% <100%> (+0.8%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32f562d...2311143. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 29, 2017

Codecov Report

Merging #18558 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18558      +/-   ##
==========================================
- Coverage   91.59%   91.58%   -0.02%     
==========================================
  Files         153      153              
  Lines       51212    51217       +5     
==========================================
- Hits        46908    46906       -2     
- Misses       4304     4311       +7
Flag Coverage Δ
#multiple 89.44% <100%> (ø) ⬆️
#single 40.67% <28.57%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 93.8% <100%> (+0.74%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/testing.py 81.82% <0%> (-0.2%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 695e893...79e255c. Read the comment docs.

# handle tz aware
if tz:
data = self.left.tz_localize(None) + 0.5 * delta
data = data.tz_localize(tz)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to do this computation? Couldn't get the correct result without removing the time zone, then adding it back afterwards.

Trying to get the midpoint between elements in left and right. Basic setup is along the lines of:

In [2]: left = pd.date_range('2017-01-01', periods=3, tz='US/Eastern')
   ...: right = pd.date_range('2017-01-02', periods=3, tz='US/Eastern')
   ...: delta = right.values - left.values
   ...:

The existing method removed the tz, and had the wrong time (all times should be 12:00):

In [3]: left.values + 0.5 * delta
Out[3]:
array(['2017-01-01T17:00:00.000000000', '2017-01-02T17:00:00.000000000',
       '2017-01-03T17:00:00.000000000'], dtype='datetime64[ns]')

Operating on the index itself instead of the values keeps the tz, but the time is still wrong:

In [4]: left + 0.5 * delta
Out[4]:
DatetimeIndex(['2017-01-01 17:00:00-05:00', '2017-01-02 17:00:00-05:00',
               '2017-01-03 17:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

Having delta as a TimdeltaIndex raises:

In [20]: left + 0.5 * (right - left)
---------------------------------------------------------------------------
TypeError: data type not understood

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[20] should work, we might have an issue about this; let's fix this fix this first in another PR. don't use .values with a datetime generally, you either want to operate on the .asi8, the actual values (generally you will also need to .tz_localize(None) first to make them UTC, then reverse this later though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, looks like #17558 is the issue for this, and #17583 is an open PR, albeit somewhat stale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I closed that other PR as its stale. So options are to: issue a new PR before this one to fix #17558, or to do hacky fix here with a TODO, or xfail the tests for this (for now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to look into that issue over the next few days and open new PR to fix it prior to this one being merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

@jreback jreback added Interval Interval data type Timezones Timezone data dtype labels Nov 29, 2017
elif isinstance(left, ABCPeriodIndex):
msg = 'Period dtypes are not supported, use a PeriodIndex instead'
raise ValueError(msg)
elif isinstance(left, ABCDatetimeIndex) and left.tz != right.tz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually compare
str(left.tz) != str(right.tz)

# handle tz aware
if tz:
data = self.left.tz_localize(None) + 0.5 * delta
data = data.tz_localize(tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[20] should work, we might have an issue about this; let's fix this fix this first in another PR. don't use .values with a datetime generally, you either want to operate on the .asi8, the actual values (generally you will also need to .tz_localize(None) first to make them UTC, then reverse this later though).

else:
data = self.left + 0.5 * delta

return DatetimeIndex(data, freq=freq, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to reconvert the type

@jschendel
Copy link
Member Author

Fixed the mid computation now that #18653 is merged. I believe that was the last item that was currently outstanding here.

@jreback jreback added this to the 0.22.0 milestone Dec 8, 2017
@jreback jreback merged commit 0229538 into pandas-dev:master Dec 8, 2017
@jreback
Copy link
Contributor

jreback commented Dec 8, 2017

thanks @jschendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type Timezones Timezone data dtype
Projects
None yet
3 participants