-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
implement astype portion of #24024 #24405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #24405 +/- ##
==========================================
+ Coverage 92.3% 92.31% +0.01%
==========================================
Files 163 163
Lines 51943 51965 +22
==========================================
+ Hits 47946 47972 +26
+ Misses 3997 3993 -4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24405 +/- ##
==========================================
+ Coverage 92.3% 92.31% +0.01%
==========================================
Files 165 165
Lines 52176 52194 +18
==========================================
+ Hits 48161 48183 +22
+ Misses 4015 4011 -4
Continue to review full report at Codecov.
|
is this contingent on #24394 ? should that be first? |
No, they are independent. They just both implement+use |
added requested assertion in test_period and passed copy=copy to numpy copy in the DatetimeLikeArrayMixin.astype case where it was obvious. For the rest I'm going to leave that to Tom in #24024 post-rebase (and if this goes through I'll make a PR on his branch to help rebase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel its much simpler if you actually respond to each comment and resolve if you are doing it, otherwise comment.
pandas/core/indexes/datetimelike.py
Outdated
# dtype=object to disable inference. But, DTA.astype ignores | ||
# integer sign and size, so we need to detect that case and | ||
# just choose int64. | ||
dtype = pandas_dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this is necessary as it already coerces properly, doing it here is very weird.
In [2]: pd.Index([1,2,3],dtype='int32')
Out[2]: Int64Index([1, 2, 3], dtype='int64')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in the last 8 hours, no. May need to wait on Tom to clarify, since all of this was taken from 24024.
(the fact that these things get closer attention in smaller doses reassures me that splitting is a good idea, even if it does cause rebasing hassles in the parent PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sounds ok then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the question here? Why we do the integer check? Astype ignores the sign and size. I suppose the index constructor just ignores the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you saw the part about uint vs. int.
So I'm just going to decide that the expected behavior for {Datetime,Timedelta,Period}Index.astype("uint{8,16,32,64}")
is to return a UInt64Index. That means we can remove this check and just pass new_values
through with the original dtype.
@jbrockmendel do you want to do that here? It's not at all tested, and will need a release note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok by me (of course its weird to do this, but hey)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to do that here? It's not at all tested, and will need a release note.
I tried this, pretty much just deleting ten lines here, and ended up getting two failures in pandas/tests/indexes/interval/test_astype.py. I can fix this by changing dtype=dtype
to dtype=new_values.dtype
in the call that wraps self._eadata.astype
. Is that what you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempt #2 at this also failed. Any other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this note last night. I implemented this in 3fca810 if you could take a look.
What are the new parts that was discovered in this PR? |
Primarily the int32 casting stuff and some |
pandas/core/indexes/datetimelike.py
Outdated
# dtype=object to disable inference. But, DTA.astype ignores | ||
# integer sign and size, so we need to detect that case and | ||
# just choose int64. | ||
dtype = pandas_dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having code here is just another inconsistency and maintenance burden. The constructor does the inference.
pandas/tests/arrays/test_period.py
Outdated
@@ -88,24 +87,29 @@ def test_take_raises(): | |||
arr.take([0, -1], allow_fill=True, fill_value='foo') | |||
|
|||
|
|||
@pytest.mark.parametrize('dtype', [int, np.int32, np.int64]) | |||
@pytest.mark.parametrize('dtype', [int, np.int32, np.int64, 'uint']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of uint will cause 3fca810
to fail. Removing, since it's tested elsewhere now.
Though those tests are just index. I'll add ones for arrays as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5fa32e9. The test is slightly more complicated now.
I think |
Though now that I think about it, this is some pretty strange behavior In [3]: pd.timedelta_range('2000', periods=2)._data.astype(str)[0]
Out[3]: Timedelta('0 days 00:00:00.000002') Looks like we need to bring in the |
5d718e6 (hopefully) fixed TImedeltaArray._format_native_types to return an array of strings. |
return self | ||
elif is_period_dtype(dtype): | ||
return self.to_period(freq=dtype.freq) | ||
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed... it'd be nice to leave a bunch of TODO: Use super
for places like this.
Actually... I think Python2 will force us to make this changes when we switch inheritance to composition, since we won't be able to call the unbound method with a DatetimeIndex anymore (I think).
Py2 failure: https://travis-ci.org/pandas-dev/pandas/jobs/473068039#L2035. Also have some linting failures. I'll assume you have other things to work on @jbrockmendel, and continue to push changes here if that's OK. |
This makes the default na repr match the expected type for the formatter.
e29d898 hopefully fixes the Py2 failure. At some point, the type for the |
lgtm. assume you just rebased? and if you want to add TODO have at it. ping on green. |
By "just" rebased do you mean recently? If so I think the last one was I'll merge master again when I fix this last py2 error. If you meant "is rebasing the only thing you did?" then no, there are some real changes here. It looks like the period tests aren't happy with my changes to format arrays. On 0.23.4 we were inconsistent In [5]: type(pd.period_range('2000', periods=2).astype(str)[0])
Out[5]: numpy.unicode_
In [6]: type(pd.date_range('2000', periods=2).astype(str)[0])
Out[6]: str I'll revert the period change here and open an issue. |
Merged master and fixed the py2 issue ( I decided not to open an issue because the behavior is correct for Python 3 and we're not going to change this for 0.24, so who cares :) |
Extremely happy to hand this one off, thanks for figuring it out. |
ok lgtm. ping on green. |
All green. |
thanks! nice @jbrockmendel and @TomAugspurger |
with pytest.raises(TypeError, match=msg): | ||
idx.astype(dtype) | ||
|
||
@pytest.mark.parametrize('tz', [None, 'US/Central']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. This should probably be in indexes/datetimes/test_astype.py
. Or rather, we should be using timedelta_range
here.
and accompanying tests
Uses _eadata like #24394
Constitutes ~10% of the diff of #24024, more after that gets rebased.