Skip to content

Accepts integer/float string with units and raises when unit is ambiguous (2) #23025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 50 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ab5f7ba
Added null context manager as do_not_raise for parametrization in pyt…
Oct 5, 2018
7a38ecd
Added test for pure integer strings and test for strings with/without…
Oct 5, 2018
a65a35a
Added entry to whatsnew.
Oct 5, 2018
57dea1d
Changed Timedelta to raise for strings without unit.
Oct 5, 2018
b8cf952
WIP refactoring.
Oct 7, 2018
8154859
Deprecation warning when we don't have units in string numbers.
Oct 7, 2018
a90a992
Fixed linting.
Oct 7, 2018
1184b6e
Added to_timedelta test.
Oct 7, 2018
5e28b49
scalar case in to_timedelta checks for strings.
Oct 7, 2018
fafe838
Updated test with more cases for to_timedelta.
Oct 7, 2018
3ca0dae
All tests passing. Implemented checks on parse_timedelta_string.
Oct 7, 2018
8105f9c
Updated tests to avoid creating timedeltas from strings without unit.
Oct 7, 2018
e90a18b
Updated more tests to avoid creating timedeltas from strings without …
Oct 7, 2018
e0a55a8
Updated more tests to avoid creating timedeltas from strings without …
Oct 7, 2018
827aebf
Fixed failing test.
Oct 7, 2018
9bff504
Removed litle whitespace.
Oct 8, 2018
b6c7d45
Type specified unit as object.
Oct 8, 2018
9ff6ade
Removed newline.
Oct 8, 2018
be93e4b
Moved default unit down to more core functions.
Oct 10, 2018
85f0103
Fixed linting.
Oct 10, 2018
673ad19
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 23, 2018
eb7d3f8
Merge remote-tracking branch 'origin/bugfix/string-timedelta2' into b…
Oct 23, 2018
bb40f77
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 24, 2018
2e331c1
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Oct 31, 2018
bada4a7
Minor typo.
Nov 5, 2018
8c0425a
Lint
Nov 5, 2018
493a14b
Using ValueError instead of Exception.
Nov 5, 2018
e525ec4
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 5, 2018
448b68a
Catching TypeError as well.
Nov 5, 2018
bd173a9
LINT
Nov 5, 2018
d16c4c5
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 6, 2018
1cfc729
Refactored tests, fixed little bug in array_to_timedelta64 and explai…
Nov 6, 2018
e559f06
Fixed unit as None (instead of ns), Fixed test.
Nov 6, 2018
b914950
LINT: Fixed long lines.
Nov 10, 2018
64a4440
LINT: Fixed long lines.
Nov 10, 2018
4d26474
LINT: Fixed indentation of closing bracket.
Nov 10, 2018
db41054
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 26, 2018
2f53004
Added comment.
Nov 26, 2018
fb40c94
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 26, 2018
f5e59ee
Casting unit to 'ns' if None.
Nov 26, 2018
e7266da
Updated documentation and issuing DeprecationWarning instead of excep…
Nov 26, 2018
b6a02df
Fixed linting.
Nov 26, 2018
694abe9
DOC: Reorganized whatsnew
Nov 27, 2018
4d86ae1
CLN: Minor refactoring.
Nov 27, 2018
40b09f4
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 27, 2018
a9c4675
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Nov 27, 2018
95302de
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Dec 3, 2018
661fd70
Merge remote-tracking branch 'upstream/master' into bugfix/string-tim…
Jan 7, 2019
80a1161
Added default value to func declaration.
Jan 8, 2019
dbe8fae
Fixed bug in warning string.
Jan 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -698,7 +698,7 @@ Timedelta
- Bug in :class:`TimedeltaIndex` incorrectly allowing indexing with ``Timestamp`` object (:issue:`20464`)
- Fixed bug where subtracting :class:`Timedelta` from an object-dtyped array would raise ``TypeError`` (:issue:`21980`)
- Fixed bug in adding a :class:`DataFrame` with all-`timedelta64[ns]` dtypes to a :class:`DataFrame` with all-integer dtypes returning incorrect results instead of raising ``TypeError`` (:issue:`22696`)
-
- Bug in :class:`Timedelta`: where passing a string of a pure number would not take unit into account. Also raises for ambiguous/duplicate unit specification (:issue: `12136`)

Timezones
^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timedeltas.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from numpy cimport int64_t

# Exposed for tslib, not intended for outside use.
cdef parse_timedelta_string(object ts)
cpdef parse_timedelta_string(object ts, specified_unit=*)
cpdef int64_t cast_from_unit(object ts, object unit) except? -1
cpdef int64_t delta_to_nanoseconds(delta) except? -1
cpdef convert_to_timedelta64(object ts, object unit)
Expand Down
20 changes: 13 additions & 7 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ cpdef array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
# this is where all of the error handling will take place.
try:
for i in range(n):
result[i] = parse_timedelta_string(values[i])
result[i] = parse_timedelta_string(values[i], specified_unit=unit)
except:
for i in range(n):
try:
Expand Down Expand Up @@ -287,7 +287,7 @@ cdef inline _decode_if_necessary(object ts):
return ts


cdef inline parse_timedelta_string(object ts):
cpdef inline parse_timedelta_string(object ts, specified_unit=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback said:

this should not be handled here at all

I think differently I would like to present the reasons why I think this is the proper place:

parse_timedelta_string gets a string and already checks internally for units. Currently it raises ValueError if it gets a decimal number and no units - but allows a integer number without units.

If we don't leave the responsibility of checking units for string in parse_timedelta_string, it means that:

  • We have to implement this same checks three times: class Timedelta, and functions _coerce_scalar_to_timedelta and _convert_listlike (last two are the to_timedelta path)
  • The unit check inside parse_timedelta_string is now redundant and should be removed.

The way it is right now we just give it the externally provided units and it can now do the checks itself - and that's all, and all cases are covered. Very little coding needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename this to just unit, and then in the parsing if you find a unit call it have_unit (similar to others)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other have_ variables are all boolean, and it could be a bit weird to call the parsed unit as have_unit. Can we call it something like parsed_unit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

"""
Parse a regular format timedelta string. Return an int64_t (in ns)
or raise a ValueError on an invalid parse.
Expand Down Expand Up @@ -401,6 +401,14 @@ cdef inline parse_timedelta_string(object ts):
have_value = 1
have_dot = 0

# Consider units from outside
if not unit:
if specified_unit:
unit = specified_unit
else:
if specified_unit:
raise ValueError("units doubly specified")

# we had a dot, but we have a fractional
# value since we have an unit
if have_dot and len(unit):
Expand Down Expand Up @@ -442,14 +450,12 @@ cdef inline parse_timedelta_string(object ts):
else:
raise ValueError("unit abbreviation w/o a number")

# treat as nanoseconds
# but only if we don't have anything else
# raise if we just have a number without units
else:
if have_value:
raise ValueError("have leftover units")
if len(number):
r = timedelta_from_spec(number, frac, 'ns')
result += timedelta_as_neg(r, neg)
raise ValueError("number string without units")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned it before, but I think we need to deprecate this first. We can here raise a deprecation warning when unit is None, and still fallback to unit='ns' for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!


return result

Expand Down Expand Up @@ -1119,7 +1125,7 @@ class Timedelta(_Timedelta):
if len(value) > 0 and value[0] == 'P':
value = parse_iso_format_string(value)
else:
value = parse_timedelta_string(value)
value = parse_timedelta_string(value, specified_unit=unit)
value = np.timedelta64(value)
elif PyDelta_Check(value):
value = convert_to_timedelta64(value, 'ns')
Expand Down
25 changes: 15 additions & 10 deletions pandas/core/tools/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
import pandas as pd
from pandas._libs import tslibs
from pandas._libs.tslibs.timedeltas import (convert_to_timedelta64,
array_to_timedelta64)
array_to_timedelta64,
parse_timedelta_string)

from pandas.core.dtypes.common import (
ensure_object,
Expand Down Expand Up @@ -142,15 +143,19 @@ def _coerce_scalar_to_timedelta_type(r, unit='ns', box=True, errors='raise'):
"""Convert string 'r' to a timedelta object."""

try:
result = convert_to_timedelta64(r, unit)
except ValueError:
if errors == 'raise':
raise
elif errors == 'ignore':
return r

# coerce
result = pd.NaT
result = parse_timedelta_string(r)
result = np.timedelta64(result)
except Exception:
try:
result = convert_to_timedelta64(r, unit)
except ValueError:
if errors == 'raise':
raise
elif errors == 'ignore':
return r

# coerce
result = pd.NaT

if box:
result = tslibs.Timedelta(result)
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ class TestFrameFlexArithmetic(object):
def test_df_add_td64_columnwise(self):
# GH#22534 Check that column-wise addition broadcasts correctly
dti = pd.date_range('2016-01-01', periods=10)
tdi = pd.timedelta_range('1', periods=10)
tdi = pd.timedelta_range('1ns', periods=10)
tser = pd.Series(tdi)
df = pd.DataFrame({0: dti, 1: tdi})

Expand Down Expand Up @@ -270,7 +270,7 @@ def test_df_bool_mul_int(self):
def test_td64_df_add_int_frame(self):
# GH#22696 Check that we don't dispatch to numpy implementation,
# which treats int64 as m8[ns]
tdi = pd.timedelta_range('1', periods=3)
tdi = pd.timedelta_range('1ns', periods=3)
df = tdi.to_frame()
other = pd.DataFrame([1, 2, 3], index=tdi) # indexed like `df`
with pytest.raises(TypeError):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/timedeltas/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def test_drop_duplicates(self):
'T', '2T', 'S', '-3S'])
def test_infer_freq(self, freq):
# GH#11018
idx = pd.timedelta_range('1', freq=freq, periods=10)
idx = pd.timedelta_range('1ns', freq=freq, periods=10)
result = pd.TimedeltaIndex(idx.asi8, freq='infer')
tm.assert_index_equal(idx, result)
assert result.freq == freq
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/scalar/timedelta/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ def test_construction():
with pytest.raises(ValueError):
Timedelta('3.1415')

with pytest.raises(ValueError):
Timedelta('2000')

# invalid construction
tm.assert_raises_regex(ValueError, "cannot construct a Timedelta",
lambda: Timedelta())
Expand Down Expand Up @@ -210,3 +213,20 @@ def test_td_constructor_on_nanoseconds(constructed_td, conversion):
def test_td_constructor_value_error():
with pytest.raises(TypeError):
Timedelta(nanoseconds='abc')


@pytest.mark.parametrize("str_unit, unit, expectation", [
("", "s", tm.do_not_raise), # Expected case
("s", "s", pytest.raises(ValueError)), # Units doubly defined
("s", "d", pytest.raises(ValueError)),
("", None, pytest.raises(ValueError)), # No units
])
def test_string_with_unit(str_unit, unit, expectation):
with expectation:
val_str = "10{}".format(str_unit)
expected_td = Timedelta(10, unit=unit)

assert Timedelta(val_str, unit=unit) == expected_td
assert pd.to_timedelta(val_str, unit=unit) == expected_td
assert pd.to_timedelta([val_str, val_str], unit=unit) == \
pd.to_timedelta([expected_td, expected_td])
2 changes: 1 addition & 1 deletion pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -999,7 +999,7 @@ def test_resample_single_period_timedelta(self):
def test_resample_timedelta_idempotency(self):

# GH 12072
index = pd.timedelta_range('0', periods=9, freq='10L')
index = pd.timedelta_range('0ns', periods=9, freq='10L')
series = Series(range(9), index=index)
result = series.resample('10L').mean()
expected = series
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/util/test_hashing.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class TestHashing(object):
Series([True, False, True] * 3),
Series(pd.date_range('20130101', periods=9)),
Series(pd.date_range('20130101', periods=9, tz='US/Eastern')),
Series(pd.timedelta_range('2000', periods=9))])
Series(pd.timedelta_range('2000ns', periods=9))])
def series(self, request):
return request.param

Expand Down
14 changes: 14 additions & 0 deletions pandas/util/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,20 @@
from pandas.io.common import urlopen


class NullContextManager(object):
def __init__(self, dummy_resource=None):
self.dummy_resource = dummy_resource

def __enter__(self):
return self.dummy_resource

def __exit__(self, *args):
pass


do_not_raise = NullContextManager()


N = 30
K = 4
_RAISE_NETWORK_ERROR_DEFAULT = False
Expand Down