Skip to content

BUG: IntervalIndex set op bugs for empty results #19112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 12, 2018

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Jan 7, 2018

Fixed bugs described in the issue:

  • Set ops that result in an empty index no longer raises.
  • For empty results, the return dtype now matches the dtype of the first index.
    • Previously would always return float64 or object for empty results, depending on the op.
  • Didn't mention the raising issue in the whatsnew, since that behavior never occurred on a release, but did mention the dtype change since it occurs on 0.22.0.

Added an additional #19016 follow-up:

  • Raise when set operation would result in a prohibited dtype:
    • Occurs when the common subtype between the two indexes is object.
    • Ex: union between interval[int64] and interval[datetime64[ns]].
  • Behavior is generally still the same as after API: prohibit non-numeric dtypes in construction of IntervalIndex #19016, but improved how the behavior comes about:
    • Previously the error was raised after the set op data was computed, now the check occurs before.
    • Previous error message was generic, now the error message is more specific.

@codecov
Copy link

codecov bot commented Jan 7, 2018

Codecov Report

Merging #19112 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19112      +/-   ##
==========================================
+ Coverage   91.52%   91.52%   +<.01%     
==========================================
  Files         147      147              
  Lines       48775    48783       +8     
==========================================
+ Hits        44639    44647       +8     
  Misses       4136     4136
Flag Coverage Δ
#multiple 89.89% <100%> (ø) ⬆️
#single 41.61% <10%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 92.29% <100%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8acdf80...49e0a49. Read the comment docs.

@jreback jreback added Bug Interval Interval data type labels Jan 7, 2018
@jreback
Copy link
Contributor

jreback commented Jan 7, 2018

Didn't mention the raising issue in the whatsnew, since that behavior never occurred on a release, but did mention the dtype change since it occurs on 0.22.0.

generally just add this issue number on the previous whatnew entry

def func(self, other):
msg = ('can only do set operations between two IntervalIndex '
'objects that are closed on the same side')
other = self._as_like_interval_index(other, msg)

if check_subtypes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't you always check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operations between incompatible subtypes are technically valid for difference and intersection, since they always return a subset of the original index, so the left dtype can always be maintained:

In [2]: ii1 = pd.interval_range(0, 4)

In [3]: ii2 = pd.interval_range(pd.Timestamp('20180101'), periods=4)

In [4]: ii1.difference(ii2)
Out[4]:
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]]
              closed='right',
              dtype='interval[int64]')

In [5]: ii1.intersection(ii2)
Out[5]:
IntervalIndex([]
              closed='right',
              dtype='interval[int64]')

While technically valid, not sure that difference and intersection really make sense for incompatible subtypes. Didn't want to immediately jump to raising for a technically valid op, but can certainly do the check for all set ops and always raise for incompatible subtypes, if that's the desired behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I would rather not have a special check here. we simply don't want to allow ops between invalid types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added this to the 0.23.0 milestone Jan 7, 2018
@jschendel jschendel force-pushed the ii-empty-set-ops branch 3 times, most recently from e46f0f1 to ab550f1 Compare January 8, 2018 20:17
@jreback jreback merged commit 5853b79 into pandas-dev:master Jan 12, 2018
@jreback
Copy link
Contributor

jreback commented Jan 12, 2018

thanks!

@jschendel jschendel deleted the ii-empty-set-ops branch January 12, 2018 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.IntervalIndex.difference raises when empty difference
2 participants