Skip to content

ENH: Add IntervalDtype support to IntervalIndex.astype #19231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2018

Conversation

jschendel
Copy link
Member

Summary:

  • Added support for IntervalIndex.astype(IntervalDtype(...))
    • Allows for subtype conversion, e.g. interval[int64] --> interval[float64]
  • Created a new /interval/test_astype.py file for the tests
    • Similar to how these tests were split into a separate file for /datetimes, and /timedeltas
    • Created a base class to hold tests that are common to any subtype
    • Created separate subtypes classes that hold tests for behavior unique to the subtype
    • Significantly expanded the astype related tests for IntervalIndex
    • Didn't parametrize over closed, since it doesn't really come into play for astype, though each variation of closed is accounted for among the indexes tested
  • Minor fix to IntervalDtype class when an invalid subtype comparison is made
    • See comment on the relevant code change for example

@@ -710,7 +710,8 @@ def __eq__(self, other):
# None should match any subtype
return True
else:
return self.subtype == other.subtype
from pandas.core.dtypes.common import is_dtype_equal
return is_dtype_equal(self.subtype, other.subtype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was necessary since some subtypes don't nicely compare:

In [2]: dtype1 = IntervalDtype('float64')

In [3]: dtype2 = IntervalDtype('datetime64[ns, US/Eastern]')

In [4]: dtype1 == dtype2
---------------------------------------------------------------------------
TypeError: data type not understood

Though interestingly enough the reverse comparison was fine:

In [5]: dtype2 == dtype1
Out[5]: False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is because numpy types don't handle comparisons vs pandas dtypes.

@jreback jreback added Interval Interval data type Dtype Conversions Unexpected or buggy dtype conversions labels Jan 13, 2018
@@ -710,7 +710,8 @@ def __eq__(self, other):
# None should match any subtype
return True
else:
return self.subtype == other.subtype
from pandas.core.dtypes.common import is_dtype_equal
return is_dtype_equal(self.subtype, other.subtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is because numpy types don't handle comparisons vs pandas dtypes.

if is_interval_dtype(dtype):
return self.copy() if copy else self
dtype = pandas_dtype(dtype)
if is_interval_dtype(dtype) and dtype != self.dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do: not is_dtype(equal(dtype, self.dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a bit more specific than not is_dtype_equal; what this is really looking for is an IntervalDtype with a different subtype than the existing dtype. Just having not is_dtype_equal would allow non-interval dtypes to pass through as well.

I could make the "different subtype" part more explicit with and not is_dtype_equal(dtype.subtype, self.dtype.subtype). Though this would be a partial dupe of the IntervalDtype.__eq__ logic. Could also just add a comment noting the "different subtype" part, if that would be preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I c, this is ok then. these are guaranteed to be II types and thus they are comparable

index.astype(dtype)

# float64 -> integer-like fails with non-integer valued floats
index = interval_range(0.0, 10.0, freq=0.25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy actually allows things like this (IOW truncation of floats). Are you failing this directly?

I agree that it should fail though, its just a can of worms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test is being xfailed with @pytest.mark.xfail(reason='GH 15832'). I think fixing #15832 should resolve this, since the changes should flow through to IntervalIndex as well, since it's relying on Float64Index to hold the endpoints.

The current behavior within this PR is truncation of floats, identical to the existing Float64Index behavior:

In [2]: ii = pd.interval_range(0.25, 1.5, freq=0.25)

In [3]: ii
Out[3]:
IntervalIndex([(0.25, 0.5], (0.5, 0.75], (0.75, 1.0], (1.0, 1.25], (1.25, 1.5]]
              closed='right',
              dtype='interval[float64]')

In [4]: ii.astype('interval[int64]')
Out[4]:
IntervalIndex([(0, 0], (0, 0], (0, 1], (1, 1], (1, 1]]
              closed='right',
              dtype='interval[int64]')

In [5]: pd.Float64Index([1.1, 2.2, 3.3]).astype('int64')
Out[5]: Int64Index([1, 2, 3], dtype='int64')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine

closed=index.closed)
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize('subtype_start, subtype_end', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests where we fail on int/unit sub-types that we don't support e.g. (int8,16,32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have these for construction as well (tests I mean)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't any construction related tests for unsupported int/uint. This also doesn't fail with how I've implemented IntervalIndex.astype in this PR, since it relies on the behavior of the underlying endpoint indexes. I can't find a way in which those raise when passed an unsupported int/uint, as they always appear to upcast to int64:

In [2]: pd.Index([1, 2, 3], dtype='int16')
Out[2]: Int64Index([1, 2, 3], dtype='int64')

In [3]: pd.Int64Index([1, 2, 3]).astype('int8')
Out[3]: Int64Index([1, 2, 3], dtype='int64')

In [4]: pd.Index(np.arange(5, dtype='int16'))
Out[4]: Int64Index([0, 1, 2, 3, 4], dtype='int64')

In [5]: pd.Int64Index([1, 2, 3]).astype('uint8')
Out[5]: UInt64Index([1, 2, 3], dtype='uint64')

Seems like this would need to fail at the level shown above for it to fail on IntervalIndex? If so, probably best done in a separate PR, since my initial guess is that it'd require a bit of work/changes. I could write xfailing tests here though. Or am I misinterpreting what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right these are all valid (as these are how indexes are constructed), maybe just add some tests to confirm for II

@codecov
Copy link

codecov bot commented Jan 14, 2018

Codecov Report

Merging #19231 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19231      +/-   ##
==========================================
- Coverage   91.55%   91.53%   -0.02%     
==========================================
  Files         147      147              
  Lines       48797    48805       +8     
==========================================
- Hits        44675    44673       -2     
- Misses       4122     4132      +10
Flag Coverage Δ
#multiple 89.9% <100%> (-0.02%) ⬇️
#single 41.7% <0%> (+0.09%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 92.38% <100%> (+0.08%) ⬆️
pandas/core/dtypes/dtypes.py 96.12% <100%> (+0.01%) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️
pandas/core/dtypes/cast.py 88.18% <0%> (-0.18%) ⬇️
pandas/core/dtypes/common.py 95.32% <0%> (+0.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0477880...8caf9f2. Read the comment docs.

@jreback jreback added this to the 0.23.0 milestone Jan 14, 2018
@jreback jreback merged commit 787ab55 into pandas-dev:master Jan 14, 2018
@jreback
Copy link
Contributor

jreback commented Jan 14, 2018

thanks @jschendel nice patch!

@jschendel jschendel deleted the astype-intervaldtype branch January 15, 2018 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow IntervalIndex.astype to change subtypes when passed an IntervalDtype
2 participants