-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PROPOSAL: use datetime64 and timedelta64 to eval #22063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
60140a5
to
6c33b65
Compare
We actually want to go in the other direction. We want to make |
pandas/core/common.py
Outdated
|
||
|
||
# TODO: np.percentile not support datetime64, should be fixed in upsteam. | ||
def percentile(values, q, **kw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably belongs in core.nanops
, definitely not core.common
.
pandas/core/common.py
Outdated
# TODO: np.percentile not support datetime64, should be fixed in upsteam. | ||
def percentile(values, q, **kw): | ||
if values.dtype.kind == 'M': | ||
return np.percentile(values.view('i8'), q, **kw).astype(values.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this handle iNaT
correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can handle numpy.datetime64('NaT')
, but not iNaT
.
@@ -2134,7 +2134,7 @@ def _na_value(self): | |||
|
|||
@property | |||
def fill_value(self): | |||
return tslibs.iNaT | |||
return tslibs.NaT.asm8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iNaT
is the standard usage for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iNaT
is an int
, but here wants a numpy.datetime64('NaT')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what problem are you trying to solve here?
(operator.pow, 'bool')} | ||
(operator.pow, 'bool'), | ||
(operator.pow, '<m8[ns]'), | ||
(operator.pow, '<M8[ns]')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good fixes. Can you take it a step further and, instead of skipping, pytest.raises(TypeError)
for the appropriate cases?
@jreback #22005 will no longer exist. But the main purpose is simplify the code, and make
@jbrockmendel
As Could someone show me the benefits of the I have write a 2nd commit,which implements:
Tests haven't pass all, those failed mainly is |
Hello @holymonson! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 12, 2018 at 08:41 Hours UTC |
31a4a02
to
9bbe377
Compare
1. `.tz` info save as `DatetimeTZBlock.tz`, no longer need a `DatetimeIndex`. Multi-columns can share a same `DatetimeTZBlock` if they have same `tz`. (NotImplemented) 2. `_try_coerce_args` is implemented in `DatetimeLikeBlockMixin`, those override in subclass are removed. So as `_try_coerce_result`. (Benefited from `numpy` handing data) 3. `values_mask` and `other_mask` are also removed. (`numpy` can handle `NaT`)
9bbe377
to
7ff27d2
Compare
# let higher levels handle | ||
raise TypeError | ||
|
||
return values, other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems values
can be removed too. The other 2 _try_coerce_args
overrides don't touch values
, _try_coerce_args
might rename to _try_coerce_other
.
Can you expand on why this is a compelling reason? Maybe there is some characteristic np.ufunc that I'm not considering?
The idea is that
See #18824. As you've noticed, there are many operations (particularly arithmetic and comparisons) where |
@@ -68,6 +68,22 @@ def np_array_datetime64_compat(arr, *args, **kwargs): | |||
return np.array(arr, *args, **kwargs) | |||
|
|||
|
|||
# np.percentile should support datetime64 in upsteam. numpy/numpy#11620 | |||
# TODO: change the version if the bug fixed | |||
if _nlv >= '99.99.99': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a placeholder for "distant future"?
provide compat for np.percentile supporting datetime64 | ||
''' | ||
if values.dtype.kind == 'M': | ||
result = np.percentile(values.view('i8'), q, **kw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this handle iNaT correctly?
if _nlv >= '99.99.99': | ||
np_percentile_compat = np.percentile | ||
else: | ||
def np_percentile_compat(values, q, **kw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the iNat concern aside, this looks like a nice function. But I don't see why it should be in compat. Might make more sense in nanops?
''' | ||
provide compat for np.percentile supporting datetime64 | ||
''' | ||
if values.dtype.kind == 'M': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does timedelta64 get handled correctly by numpy?
@@ -789,8 +789,7 @@ def replace(self, to_replace, value, inplace=False, filter=None, | |||
# try to replace, if we raise an error, convert to ObjectBlock and | |||
# retry | |||
try: | |||
values, _, to_replace, _ = self._try_coerce_args(self.values, | |||
to_replace) | |||
values, to_replace = self._try_coerce_args(self.values, to_replace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell, but a lot of this looks like it might be unrelated to the np_percentile_compat function. Is that the case? If so, that's great (simplifying try_coerce_args etc would be a big win), but should be done separately.
@holymonson can you rebase? Arithmetic ops have seen a lot of cleanup in the last couple months. |
Closing as stale. Ping if you'd like to pick this back up |
This PR aims to use datetime64 and timedelta64 in evaluation directly, instead of casting to integer and then casting back.
Benefits:
values
is saved inndarray[datetime64]
, casting toint
to eval is non-intuitive. It seems back to the time before numpy 1.7 which implementsdatetime64
.numpy.ufunc
can handle all thedtype
change, e.g.datetime64 - datetime64 = timedelta64
. Additonally, nonsense operations are prohibited, e.g.datetime64 + datetime64
.datetime64('NaT')
, seemsvalues_mask
andother_mask
are no longer needed fornan
s.Series
can share the same call path withDataFrame
inBlock
. NowSeries
is usingdispatch_to_index_op
to wrap asIndex
, butIndex
is 1d-array whileDataFrame
is 2d-array, soDataFrame
can't use the same wrapper.ndarray([Index])
isdtype='O'
so can't be benefited fromnumpy.ufunc
.This commit has passed
tests/{internals,frame,dtypes}
, but nottests/series
, becausedispatch_to_index_op
breaks it.