-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/API: to_timedelta unit-argument ignored for string input #34379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ugfix/string-timedelta2
…edelta2 # Conflicts: # doc/source/whatsnew/v0.24.0.txt # pandas/_libs/tslibs/timedeltas.pxd # pandas/_libs/tslibs/timedeltas.pyx # pandas/tests/indexes/timedeltas/test_ops.py
…edelta2 # Conflicts: # pandas/_libs/tslibs/timedeltas.pyx
7041bce
to
5a0812c
Compare
pandas/_libs/tslibs/timedeltas.pxd
Outdated
@@ -1,5 +1,7 @@ | |||
from numpy cimport int64_t | |||
|
|||
# Exposed for tslib, not intended for outside use. | |||
cpdef parse_timedelta_string(object ts, object specified_unit=*) | |||
cdef int64_t cast_from_unit(object ts, object unit) except? -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been moved to tslibs.conversion, so should be edited there rather than re-introduced here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I fixed that and some other issues
c4c4ec6
to
507ff0f
Compare
5cf2649
to
bad005c
Compare
Fix linting issues, change single quotes to double quotes, add 'ns' to timedelta string.
bad005c
to
cd17104
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be handled at the entry points for users. we simply DO not allow a unit when the input is a string, full stop. We do this in to_datetime, please have a look; we would want to replicate this behavior.
@@ -701,6 +701,8 @@ Deprecations | |||
instead (:issue:`34191`). | |||
- The ``squeeze`` keyword in the ``groupby`` function is deprecated and will be removed in a future version (:issue:`32380`) | |||
|
|||
- A timedelta passed a number string without a defined unit is deprecated (:issue:`12136`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reference :class:`Timedelta`
and pandas.to_timedelta
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we deprecating this? i would simply remove it as its wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback By "we simply DO not allow a unit when the input is a string"? Do you refer to the unit inside the string or the unit parameter? This change behaviors are like this:
- Allows
to_timedelta("1s")
, used as example in https://pandas.pydata.org/docs/user_guide/timedeltas.html#to-timedelta - Allows
to_timedelta("1", unit="s")
, fixes bug BUG/API: to_timedelta unit-argument ignored for string input #12136 - Disallows
to_timedelta("1s", unit="s")
- Warns
to_timedelta("1")
I simply merged @Sup3rGeo's PR #23025 without changing any behaviors. Please let me know if you want something to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we spent a long time on the PR but was ultimately not merged.because of the way it was structured. Please re-read those comments. We want to handle this at a much higher level, if you pass a unit then your input must be numeric, there is no other option. I don't think PR needs to be very complicated at all.
@@ -371,6 +369,17 @@ cdef inline int64_t parse_timedelta_string(str ts) except? -1: | |||
have_value = 1 | |||
have_dot = 0 | |||
|
|||
# Consider units from outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing the checking at this low level? this is simply a check on the user interface, e.g. if its a string, then no unit can be specified.
see how we do this in to_datetime
else: | ||
if have_value: | ||
raise ValueError("have leftover units") | ||
if len(number): | ||
r = timedelta_from_spec(number, frac, 'ns') | ||
result += timedelta_as_neg(r, neg) | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh?
@@ -174,6 +175,8 @@ cdef convert_to_timedelta64(object ts, object unit): | |||
|
|||
Return an ns based int64 | |||
""" | |||
if unit is None: | |||
unit = 'ns' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this called from? could we type the argument as str
and require the caller to pass it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. in array_to_timedelta64 below we should just do that check once at the top of the function, not in every step
@@ -260,7 +258,7 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'): | |||
return iresult.base # .base to access underlying np.ndarray | |||
|
|||
|
|||
cdef inline int64_t parse_timedelta_string(str ts) except? -1: | |||
cpdef inline parse_timedelta_string(object ts, specified_unit=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really want to avoid weakening the type declarations
|
||
# Preserve unit if None, will be cast to nanoseconds | ||
# later on at the proper functions | ||
if unit is None or unit == 'M': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double quotes
@@ -200,7 +200,7 @@ def stringify(value): | |||
v = v.tz_convert("UTC") | |||
return TermValue(v, v.value, kind) | |||
elif kind == "timedelta64" or kind == "timedelta": | |||
v = Timedelta(v, unit="s").value | |||
v = Timedelta(v).value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what type is "v" here? if it is an integer, this is a behavior change
@@ -108,7 +108,7 @@ def to_timedelta(arg, unit="ns", errors="raise"): | |||
return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors) | |||
|
|||
|
|||
def _coerce_scalar_to_timedelta_type(r, unit="ns", errors="raise"): | |||
def _coerce_scalar_to_timedelta_type(r, unit=None, box=True, errors="raise"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why adding an unused box kwarg?
@@ -535,7 +535,7 @@ def test_resolution_deprecated(self): | |||
(Timedelta(0, unit="ns"), False), | |||
(Timedelta(-10, unit="ns"), True), | |||
(Timedelta(None), True), | |||
(NaT, True), | |||
(pd.NaT, True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls revert this
Closing in favor of #34634 |
Supersedes #23025
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff