Skip to content

BUG/API: to_timedelta unit-argument ignored for string input #34379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 53 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ab5f7ba
Added null context manager as do_not_raise for parametrization in pyt…
Oct 5, 2018
7a38ecd
Added test for pure integer strings and test for strings with/without…
Oct 5, 2018
a65a35a
Added entry to whatsnew.
Oct 5, 2018
57dea1d
Changed Timedelta to raise for strings without unit.
Oct 5, 2018
b8cf952
WIP refactoring.
Oct 7, 2018
8154859
Deprecation warning when we don't have units in string numbers.
Oct 7, 2018
a90a992
Fixed linting.
Oct 7, 2018
1184b6e
Added to_timedelta test.
Oct 7, 2018
5e28b49
scalar case in to_timedelta checks for strings.
Oct 7, 2018
fafe838
Updated test with more cases for to_timedelta.
Oct 7, 2018
3ca0dae
All tests passing. Implemented checks on parse_timedelta_string.
Oct 7, 2018
8105f9c
Updated tests to avoid creating timedeltas from strings without unit.
Oct 7, 2018
e90a18b
Updated more tests to avoid creating timedeltas from strings without …
Oct 7, 2018
e0a55a8
Updated more tests to avoid creating timedeltas from strings without …
Oct 7, 2018
827aebf
Fixed failing test.
Oct 7, 2018
9bff504
Removed litle whitespace.
Oct 8, 2018
b6c7d45
Type specified unit as object.
Oct 8, 2018
9ff6ade
Removed newline.
Oct 8, 2018
be93e4b
Moved default unit down to more core functions.
Oct 10, 2018
85f0103
Fixed linting.
Oct 10, 2018
673ad19
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 23, 2018
eb7d3f8
Merge remote-tracking branch 'origin/bugfix/string-timedelta2' into b…
Oct 23, 2018
bb40f77
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 24, 2018
2e331c1
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 31, 2018
bada4a7
Minor typo.
Nov 5, 2018
8c0425a
Lint
Nov 5, 2018
493a14b
Using ValueError instead of Exception.
Nov 5, 2018
e525ec4
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 5, 2018
448b68a
Catching TypeError as well.
Nov 5, 2018
bd173a9
LINT
Nov 5, 2018
d16c4c5
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 6, 2018
1cfc729
Refactored tests, fixed little bug in array_to_timedelta64 and explai…
Nov 6, 2018
e559f06
Fixed unit as None (instead of ns), Fixed test.
Nov 6, 2018
b914950
LINT: Fixed long lines.
Nov 10, 2018
64a4440
LINT: Fixed long lines.
Nov 10, 2018
4d26474
LINT: Fixed indentation of closing bracket.
Nov 10, 2018
db41054
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 26, 2018
2f53004
Added comment.
Nov 26, 2018
fb40c94
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 26, 2018
f5e59ee
Casting unit to 'ns' if None.
Nov 26, 2018
e7266da
Updated documentation and issuing DeprecationWarning instead of excep…
Nov 26, 2018
b6a02df
Fixed linting.
Nov 26, 2018
694abe9
DOC: Reorganized whatsnew
Nov 27, 2018
4d86ae1
CLN: Minor refactoring.
Nov 27, 2018
40b09f4
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 27, 2018
a9c4675
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 27, 2018
95302de
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Dec 3, 2018
661fd70
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Jan 7, 2019
80a1161
Added default value to func declaration.
Jan 8, 2019
dbe8fae
Fixed bug in warning string.
Jan 8, 2019
5a0812c
Merge branch 'master' into bugfix/string-timedelta2
pdnm May 26, 2020
b8031c2
Minor fix
pdnm May 27, 2020
cd17104
Merge branch 'master' into bugfix/string-timedelta2
pdnm May 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,8 @@ Deprecations
instead (:issue:`34191`).
- The ``squeeze`` keyword in the ``groupby`` function is deprecated and will be removed in a future version (:issue:`32380`)

- A timedelta passed a number string without a defined unit is deprecated (:issue:`12136`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reference :class:`Timedelta` and pandas.to_timedelta here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deprecating this? i would simply remove it as its wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback By "we simply DO not allow a unit when the input is a string"? Do you refer to the unit inside the string or the unit parameter? This change behaviors are like this:

I simply merged @Sup3rGeo's PR #23025 without changing any behaviors. Please let me know if you want something to be changed.

Copy link
Contributor

@jreback jreback Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we spent a long time on the PR but was ultimately not merged.because of the way it was structured. Please re-read those comments. We want to handle this at a much higher level, if you pass a unit then your input must be numeric, there is no other option. I don't think PR needs to be very complicated at all.


.. ---------------------------------------------------------------------------


Expand Down Expand Up @@ -777,6 +779,7 @@ Timedelta
- Bug in :func:`timedelta_range` that produced an extra point on a edge case (:issue:`30353`, :issue:`33498`)
- Bug in :meth:`DataFrame.resample` that produced an extra point on a edge case (:issue:`30353`, :issue:`13022`, :issue:`33498`)
- Bug in :meth:`DataFrame.resample` that ignored the ``loffset`` argument when dealing with timedelta (:issue:`7687`, :issue:`33498`)
- Bug in :class:`Timedelta` (and :func: `to_timedelta`) where passing a string of a pure number would not take the unit into account. Now raises for an ambiguous or duplicate unit specification.(:issue:`12136`)

Timezones
^^^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/timedeltas.pxd
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from numpy cimport int64_t

# Exposed for tslib, not intended for outside use.
cpdef parse_timedelta_string(object ts, object specified_unit=*)
cpdef int64_t delta_to_nanoseconds(delta) except? -1
cdef convert_to_timedelta64(object ts, object unit)
cpdef convert_to_timedelta64(object ts, object unit=*)
cdef bint is_any_td_scalar(object obj)
50 changes: 32 additions & 18 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import collections
import warnings

import cython

Expand Down Expand Up @@ -160,7 +161,7 @@ cpdef int64_t delta_to_nanoseconds(delta) except? -1:
raise TypeError(type(delta))


cdef convert_to_timedelta64(object ts, object unit):
cpdef convert_to_timedelta64(object ts, object unit=None):
"""
Convert an incoming object to a timedelta64 if possible.
Before calling, unit must be standardized to avoid repeated unit conversion
Expand All @@ -174,6 +175,8 @@ cdef convert_to_timedelta64(object ts, object unit):

Return an ns based int64
"""
if unit is None:
unit = 'ns'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this called from? could we type the argument as str and require the caller to pass it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. in array_to_timedelta64 below we should just do that check once at the top of the function, not in every step

if checknull_with_nat(ts):
return np.timedelta64(NPY_NAT)
elif isinstance(ts, _Timedelta):
Expand Down Expand Up @@ -218,7 +221,7 @@ cdef convert_to_timedelta64(object ts, object unit):

@cython.boundscheck(False)
@cython.wraparound(False)
def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
def array_to_timedelta64(object[:] values, unit=None, errors='raise'):
"""
Convert an ndarray to an array of timedeltas. If errors == 'coerce',
coerce non-convertible objects to NaT. Otherwise, raise.
Expand All @@ -240,13 +243,8 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
# this is where all of the error handling will take place.
try:
for i in range(n):
if values[i] is NaT:
# we allow this check in the fast-path because NaT is a C-object
# so this is an inexpensive check
iresult[i] = NPY_NAT
else:
result[i] = parse_timedelta_string(values[i])
except (TypeError, ValueError):
result[i] = parse_timedelta_string(values[i], specified_unit=unit)
except:
unit = parse_timedelta_unit(unit)
for i in range(n):
try:
Expand All @@ -260,7 +258,7 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
return iresult.base # .base to access underlying np.ndarray


cdef inline int64_t parse_timedelta_string(str ts) except? -1:
cpdef inline parse_timedelta_string(object ts, specified_unit=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really want to avoid weakening the type declarations

"""
Parse a regular format timedelta string. Return an int64_t (in ns)
or raise a ValueError on an invalid parse.
Expand Down Expand Up @@ -371,6 +369,17 @@ cdef inline int64_t parse_timedelta_string(str ts) except? -1:
have_value = 1
have_dot = 0

# Consider units from outside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing the checking at this low level? this is simply a check on the user interface, e.g. if its a string, then no unit can be specified.

see how we do this in to_datetime

if not unit:
if specified_unit:
unit = [specified_unit]
else:
if specified_unit:
raise ValueError(
"units were doubly specified, both as an argument ({}) "
"and inside string ({})".format(specified_unit, unit)
)

# we had a dot, but we have a fractional
# value since we have an unit
if have_dot and len(unit):
Expand Down Expand Up @@ -412,14 +421,17 @@ cdef inline int64_t parse_timedelta_string(str ts) except? -1:
else:
raise ValueError("unit abbreviation w/o a number")

# treat as nanoseconds
# but only if we don't have anything else
# raise if we just have a number without units
else:
if have_value:
raise ValueError("have leftover units")
if len(number):
r = timedelta_from_spec(number, frac, 'ns')
result += timedelta_as_neg(r, neg)
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

"number string without units is deprecated and "
"will raise an exception in future versions. Considering as nanoseconds.",
FutureWarning
)
result = timedelta_from_spec(number, frac, 'ns')

return result

Expand Down Expand Up @@ -478,10 +490,12 @@ cpdef inline str parse_timedelta_unit(object unit):
------
ValueError : on non-parseable input
"""
if unit is None:
return "ns"
elif unit == "M":

# Preserve unit if None, will be cast to nanoseconds
# later on at the proper functions
if unit is None or unit == 'M':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double quotes

return unit

try:
return timedelta_abbrevs[unit.lower()]
except (KeyError, AttributeError):
Expand Down Expand Up @@ -1158,7 +1172,7 @@ class Timedelta(_Timedelta):
if len(value) > 0 and value[0] == 'P':
value = parse_iso_format_string(value)
else:
value = parse_timedelta_string(value)
value = parse_timedelta_string(value, specified_unit=unit)
value = np.timedelta64(value)
elif PyDelta_Check(value):
value = convert_to_timedelta64(value, 'ns')
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ def stringify(value):
v = v.tz_convert("UTC")
return TermValue(v, v.value, kind)
elif kind == "timedelta64" or kind == "timedelta":
v = Timedelta(v, unit="s").value
v = Timedelta(v).value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what type is "v" here? if it is an integer, this is a behavior change

return TermValue(int(v), v, kind)
elif meta == "category":
metadata = extract_array(self.metadata, extract_numpy=True)
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/tools/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from pandas.core.arrays.timedeltas import sequence_to_td64ns


def to_timedelta(arg, unit="ns", errors="raise"):
def to_timedelta(arg, unit=None, box=True, errors="raise"):
"""
Convert argument to timedelta.

Expand Down Expand Up @@ -108,7 +108,7 @@ def to_timedelta(arg, unit="ns", errors="raise"):
return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors)


def _coerce_scalar_to_timedelta_type(r, unit="ns", errors="raise"):
def _coerce_scalar_to_timedelta_type(r, unit=None, box=True, errors="raise"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding an unused box kwarg?

"""Convert string 'r' to a timedelta object."""
try:
result = Timedelta(r, unit)
Expand All @@ -124,7 +124,7 @@ def _coerce_scalar_to_timedelta_type(r, unit="ns", errors="raise"):
return result


def _convert_listlike(arg, unit="ns", errors="raise", name=None):
def _convert_listlike(arg, unit=None, box=True, errors="raise", name=None):
"""Convert a list of objects to a timedelta index object."""
if isinstance(arg, (list, tuple)) or not hasattr(arg, "dtype"):
# This is needed only to ensure that in the case where we end up
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/arrays/test_datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -946,7 +946,7 @@ def test_invalid_nat_setitem_array(array, non_casting_nats):
"array",
[
pd.date_range("2000", periods=4).array,
pd.timedelta_range("2000", periods=4).array,
pd.timedelta_range("2000ns", periods=4).array,
],
)
def test_to_numpy_extra(array):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ def test_floordiv_axis0_numexpr_path(self, opname):
def test_df_add_td64_columnwise(self):
# GH 22534 Check that column-wise addition broadcasts correctly
dti = pd.date_range("2016-01-01", periods=10)
tdi = pd.timedelta_range("1", periods=10)
tdi = pd.timedelta_range("1ns", periods=10)
tser = pd.Series(tdi)
df = pd.DataFrame({0: dti, 1: tdi})

Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/timedeltas/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def test_drop_duplicates(self, freq_sample, keep, expected, index):

def test_infer_freq(self, freq_sample):
# GH#11018
idx = pd.timedelta_range("1", freq=freq_sample, periods=10)
idx = pd.timedelta_range("1ns", freq=freq_sample, periods=10)
result = pd.TimedeltaIndex(idx.asi8, freq="infer")
tm.assert_index_equal(idx, result)
assert result.freq == freq_sample
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/plotting/test_datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1360,7 +1360,7 @@ def test_format_timedelta_ticks_narrow(self):

expected_labels = [f"00:00:00.0000000{i:0>2d}" for i in np.arange(10)]

rng = timedelta_range("0", periods=10, freq="ns")
rng = timedelta_range("0ns", periods=10, freq="ns")
df = DataFrame(np.random.randn(len(rng), 3), rng)
fig, ax = self.plt.subplots()
df.plot(fontsize=2, ax=ax)
Expand All @@ -1384,7 +1384,7 @@ def test_format_timedelta_ticks_wide(self):
"9 days 06:13:20",
]

rng = timedelta_range("0", periods=10, freq="1 d")
rng = timedelta_range("0ns", periods=10, freq="1 d")
df = DataFrame(np.random.randn(len(rng), 3), rng)
fig, ax = self.plt.subplots()
ax = df.plot(fontsize=2, ax=ax)
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/resample/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def test_resample_single_period_timedelta():
def test_resample_timedelta_idempotency():

# GH 12072
index = pd.timedelta_range("0", periods=9, freq="10L")
index = pd.timedelta_range("0ns", periods=9, freq="10L")
series = Series(range(9), index=index)
result = series.resample("10L").mean()
expected = series
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/scalar/timedelta/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,3 +289,28 @@ def test_timedelta_constructor_identity():
expected = Timedelta(np.timedelta64(1, "s"))
result = Timedelta(expected)
assert result is expected


@pytest.mark.parametrize(
"value, str_unit, unit, expectation",
[
# Units doubly defined
(10, "s", "d", (ValueError, "units were doubly specified")),
# Units doubly defined (same)
(10, "s", "s", (ValueError, "units were doubly specified")),
# No units, decimal string
(3.1415, "", None, (ValueError, "no units specified")),
],
)
def test_string_with_unit(value, str_unit, unit, expectation):
exp, match = expectation
with pytest.raises(exp, match=match):
val_str = "{}{}".format(value, str_unit)
expected_td = Timedelta(value, unit=unit)

assert Timedelta(val_str, unit=unit) == expected_td
assert to_timedelta(val_str, unit=unit) == expected_td
assert all(
to_timedelta([val_str, val_str], unit=unit)
== to_timedelta([expected_td, expected_td])
)
Loading