ENH: Add IntervalDtype support to IntervalIndex.astype #19231

jschendel · 2018-01-13T21:46:52Z

closes Allow IntervalIndex.astype to change subtypes when passed an IntervalDtype #19197
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Summary:

Added support for IntervalIndex.astype(IntervalDtype(...))
- Allows for subtype conversion, e.g. interval[int64] --> interval[float64]
Created a new /interval/test_astype.py file for the tests
- Similar to how these tests were split into a separate file for /datetimes, and /timedeltas
- Created a base class to hold tests that are common to any subtype
- Created separate subtypes classes that hold tests for behavior unique to the subtype
- Significantly expanded the astype related tests for IntervalIndex
- Didn't parametrize over closed, since it doesn't really come into play for astype, though each variation of closed is accounted for among the indexes tested
Minor fix to IntervalDtype class when an invalid subtype comparison is made
- See comment on the relevant code change for example

jschendel · 2018-01-13T21:48:41Z

pandas/core/dtypes/dtypes.py

@@ -710,7 +710,8 @@ def __eq__(self, other):
            # None should match any subtype
            return True
        else:
-            return self.subtype == other.subtype
+            from pandas.core.dtypes.common import is_dtype_equal
+            return is_dtype_equal(self.subtype, other.subtype)


This change was necessary since some subtypes don't nicely compare:

In [2]: dtype1 = IntervalDtype('float64') In [3]: dtype2 = IntervalDtype('datetime64[ns, US/Eastern]') In [4]: dtype1 == dtype2 --------------------------------------------------------------------------- TypeError: data type not understood

Though interestingly enough the reverse comparison was fine:

In [5]: dtype2 == dtype1 Out[5]: False

yes this is because numpy types don't handle comparisons vs pandas dtypes.

jreback · 2018-01-13T21:49:52Z

pandas/core/dtypes/dtypes.py

@@ -710,7 +710,8 @@ def __eq__(self, other):
            # None should match any subtype
            return True
        else:
-            return self.subtype == other.subtype
+            from pandas.core.dtypes.common import is_dtype_equal
+            return is_dtype_equal(self.subtype, other.subtype)


yes this is because numpy types don't handle comparisons vs pandas dtypes.

jreback · 2018-01-13T21:50:09Z

pandas/core/indexes/interval.py

-        if is_interval_dtype(dtype):
-            return self.copy() if copy else self
+        dtype = pandas_dtype(dtype)
+        if is_interval_dtype(dtype) and dtype != self.dtype:


you could do: not is_dtype(equal(dtype, self.dtype)

I think this needs to be a bit more specific than not is_dtype_equal; what this is really looking for is an IntervalDtype with a different subtype than the existing dtype. Just having not is_dtype_equal would allow non-interval dtypes to pass through as well.

I could make the "different subtype" part more explicit with and not is_dtype_equal(dtype.subtype, self.dtype.subtype). Though this would be a partial dupe of the IntervalDtype.__eq__ logic. Could also just add a comment noting the "different subtype" part, if that would be preferable.

I c, this is ok then. these are guaranteed to be II types and thus they are comparable

jreback · 2018-01-13T21:53:23Z

pandas/tests/indexes/interval/test_astype.py

+            index.astype(dtype)
+
+        # float64 -> integer-like fails with non-integer valued floats
+        index = interval_range(0.0, 10.0, freq=0.25)


numpy actually allows things like this (IOW truncation of floats). Are you failing this directly?

I agree that it should fail though, its just a can of worms.

This is test is being xfailed with @pytest.mark.xfail(reason='GH 15832'). I think fixing #15832 should resolve this, since the changes should flow through to IntervalIndex as well, since it's relying on Float64Index to hold the endpoints.

The current behavior within this PR is truncation of floats, identical to the existing Float64Index behavior:

In [2]: ii = pd.interval_range(0.25, 1.5, freq=0.25) In [3]: ii Out[3]: IntervalIndex([(0.25, 0.5], (0.5, 0.75], (0.75, 1.0], (1.0, 1.25], (1.25, 1.5]] closed='right', dtype='interval[float64]') In [4]: ii.astype('interval[int64]') Out[4]: IntervalIndex([(0, 0], (0, 0], (0, 1], (1, 1], (1, 1]] closed='right', dtype='interval[int64]') In [5]: pd.Float64Index([1.1, 2.2, 3.3]).astype('int64') Out[5]: Int64Index([1, 2, 3], dtype='int64')

ok that's fine

jreback · 2018-01-13T21:54:28Z

pandas/tests/indexes/interval/test_astype.py

+                                             closed=index.closed)
+        tm.assert_index_equal(result, expected)
+
+    @pytest.mark.parametrize('subtype_start, subtype_end', [


can you add some tests where we fail on int/unit sub-types that we don't support e.g. (int8,16,32)

do we have these for construction as well (tests I mean)?

There aren't any construction related tests for unsupported int/uint. This also doesn't fail with how I've implemented IntervalIndex.astype in this PR, since it relies on the behavior of the underlying endpoint indexes. I can't find a way in which those raise when passed an unsupported int/uint, as they always appear to upcast to int64:

In [2]: pd.Index([1, 2, 3], dtype='int16') Out[2]: Int64Index([1, 2, 3], dtype='int64') In [3]: pd.Int64Index([1, 2, 3]).astype('int8') Out[3]: Int64Index([1, 2, 3], dtype='int64') In [4]: pd.Index(np.arange(5, dtype='int16')) Out[4]: Int64Index([0, 1, 2, 3, 4], dtype='int64') In [5]: pd.Int64Index([1, 2, 3]).astype('uint8') Out[5]: UInt64Index([1, 2, 3], dtype='uint64')

Seems like this would need to fail at the level shown above for it to fail on IntervalIndex? If so, probably best done in a separate PR, since my initial guess is that it'd require a bit of work/changes. I could write xfailing tests here though. Or am I misinterpreting what you mean?

right these are all valid (as these are how indexes are constructed), maybe just add some tests to confirm for II

codecov · 2018-01-14T00:53:37Z

Codecov Report

Merging #19231 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19231      +/-   ##
==========================================
- Coverage   91.55%   91.53%   -0.02%     
==========================================
  Files         147      147              
  Lines       48797    48805       +8     
==========================================
- Hits        44675    44673       -2     
- Misses       4122     4132      +10

Flag	Coverage Δ
#multiple	`89.9% <100%> (-0.02%)`	⬇️
#single	`41.7% <0%> (+0.09%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`92.38% <100%> (+0.08%)`	⬆️
pandas/core/dtypes/dtypes.py	`96.12% <100%> (+0.01%)`	⬆️
pandas/plotting/_converter.py	`65.22% <0%> (-1.74%)`	⬇️
pandas/core/dtypes/cast.py	`88.18% <0%> (-0.18%)`	⬇️
pandas/core/dtypes/common.py	`95.32% <0%> (+0.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0477880...8caf9f2. Read the comment docs.

jreback · 2018-01-14T21:27:38Z

thanks @jschendel nice patch!

ENH: Add IntervalDtype support to IntervalIndex.astype

8caf9f2

jschendel commented Jan 13, 2018

View reviewed changes

jreback added Interval Interval data type Dtype Conversions Unexpected or buggy dtype conversions labels Jan 13, 2018

jreback requested changes Jan 13, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Jan 14, 2018

jreback approved these changes Jan 14, 2018

View reviewed changes

jreback merged commit 787ab55 into pandas-dev:master Jan 14, 2018

jschendel deleted the astype-intervaldtype branch January 15, 2018 18:26

Uh oh!

ENH: Add IntervalDtype support to IntervalIndex.astype #19231

ENH: Add IntervalDtype support to IntervalIndex.astype #19231

Uh oh!

Conversation

jschendel commented Jan 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Jan 14, 2018

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2018 •

edited

Loading