Accepts integer/float string with units and raises when unit is ambiguous #21384

Sup3rGeo · 2018-06-08T14:45:43Z

I couldn't get tests to run on my machine so I will see if it passes on the CI servers.

closes BUG/API: to_timedelta unit-argument ignored for string input #12136
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-06-08T14:45:45Z

Hello @Sup3rGeo! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/tests/scalar/timedelta/test_construction.py !
In the file pandas/util/testing.py, following are the PEP8 issues :

Line 55:5: E301 expected 1 blank line, found 0
Line 57:5: E301 expected 1 blank line, found 0

Complete extra results for this file :

file_to_check.py:2461:-258: W605 invalid escape sequence '('
file_to_check.py:2461:-255: W605 invalid escape sequence ')'

Comment last updated on October 01, 2018 at 12:55 Hours UTC

gfyoung · 2018-06-08T17:41:14Z

pandas/tests/scalar/timedelta/test_construction.py

+
+
+@pytest.mark.parametrize("redundant_unit, expectation", [
+    ("", not_raises()),


I believe pytest.raises(None) should suffice.

It doesn't - it is still an open issue in pytest: pytest-dev/pytest#1830
Do you have a better idea than to define this empty context manager?

Odd that we can have an empty context manager with warnings assertions, but that's not implemented for exceptions. Implementing the empty context manager wouldn't super difficult on our end (we have pandas.util.testing.assert_raises_regex).

Only question is how well-received such an idea would be.

@jreback @jorisvandenbossche : Thoughts?

codecov · 2018-06-08T20:24:54Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a277e4a). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #21384   +/-   ##
=========================================
  Coverage          ?   91.89%           
=========================================
  Files             ?      153           
  Lines             ?    49596           
  Branches          ?        0           
=========================================
  Hits              ?    45576           
  Misses            ?     4020           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.29% <ø> (?)`
#single	`41.86% <ø> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a277e4a...85f04c9. Read the comment docs.

gfyoung · 2018-06-08T23:14:07Z

pandas/tests/scalar/timedelta/test_construction.py

+    0.001, 1, 10])
+def test_string_with_unit(num, sign, unit, redundant_unit, expectation):
+    with expectation:
+        assert Timedelta(str(sign * num) + redundant_unit, unit=unit)\


nit: can't we somehow do multiline with an assert statement without the slash (we generally don't use the slash when doing multiline in this repository)?

well I could break the whole statement in two, moving the Timedelta first argument to a variable. This would reduce the line length. Should I?

That would also work.

gfyoung

The not_raises question notwithstanding (it's not a blocker unless we decide we want to do this), this looks good to me!

cc @jreback

jorisvandenbossche · 2018-06-12T08:38:28Z

doc/source/whatsnew/v0.23.1.txt

@@ -59,6 +59,7 @@ Data-type specific

 - Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)
 - Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)
+- Bug in :class:`Timedelta`: where passing a string of a pure number would not take unit into account. Also raises for ambiguous/duplicate unit specification (:issue: `12136`)


Can you move this to 0.24.0.txt?

jorisvandenbossche · 2018-06-12T08:57:51Z

pandas/tests/scalar/timedelta/test_construction.py

@@ -85,10 +85,6 @@ def test_construction():
    with pytest.raises(ValueError):
        Timedelta('10 days -1 h 1.5m 1s 3us')

-    # no units specified
-    with pytest.raises(ValueError):
-        Timedelta('3.1415')


Why do we want to allow this?

we don't allow this, pls revert. a string w/o a unit must raise.

Well, looking back at the issue #12136, there was discussion about this and said we would allow this. So that is a good reason this is PR is doing that! :-)

(but personally, I would simply not allow it)

allowing a string with a unit is fine
this does not have a unit

jorisvandenbossche · 2018-06-12T09:16:45Z

pandas/tests/scalar/timedelta/test_construction.py

+@pytest.mark.parametrize("redundant_unit, expectation", [
+    ("", not_raises()),
+    ("d", pytest.raises(ValueError)),
+    ("us", pytest.raises(ValueError))])


Is it needed to test that it is raising the value error with all the other combinations? (because this gives a combinatorial explosion of number of test .. ;-))

I see, I just wanted to make sure kinda all cases were covered. I believe this does not give a lot of combinations, so I thought that for this case specifically it would be ok.

jorisvandenbossche · 2018-06-12T09:17:27Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -999,10 +1004,21 @@ class Timedelta(_Timedelta):
        if isinstance(value, Timedelta):
            value = value.value
        elif is_string_object(value):
-            if len(value) > 0 and value[0] == 'P':
+            # Check if it is just a number in a string


Do we want to support this? (do we now?)

jreback · 2018-06-12T11:06:57Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -245,7 +245,7 @@ cdef inline _decode_if_necessary(object ts):
    return ts


-cdef inline parse_timedelta_string(object ts):
+cdef inline parse_timedelta_string(object ts, specified_unit=None):


this should not be handled here at all

jreback · 2018-06-12T11:09:13Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -999,10 +1004,21 @@ class Timedelta(_Timedelta):
        if isinstance(value, Timedelta):
            value = value.value
        elif is_string_object(value):
-            if len(value) > 0 and value[0] == 'P':
+            # Check if it is just a number in a string
+            try:


just try to float here. if it succeeds it must have a unit specified, if not you must raise, e.g. Timedelta('1000') is not accepted

jreback · 2018-06-12T11:09:21Z

pandas/_libs/tslibs/timedeltas.pyx

                value = parse_iso_format_string(value)
            else:
-                value = parse_timedelta_string(value)
+                value = parse_timedelta_string(value, unit)


no not pass here

jreback · 2018-06-12T11:10:35Z

pandas/tests/scalar/timedelta/test_construction.py

@@ -85,10 +85,6 @@ def test_construction():
    with pytest.raises(ValueError):
        Timedelta('10 days -1 h 1.5m 1s 3us')

-    # no units specified
-    with pytest.raises(ValueError):
-        Timedelta('3.1415')


we don't allow this, pls revert. a string w/o a unit must raise.

jreback · 2018-06-12T11:10:47Z

pandas/tests/scalar/timedelta/test_construction.py

+
+class not_raises(object):
+    def __enter__(self):
+        pass


remove this its not idiomatic

I kinda needed it for parametrizing on the expectation. Should I then just add a if/else in the test to use or not pytest.raises?

jreback · 2018-06-12T11:11:07Z

pandas/tests/scalar/timedelta/test_construction.py

+    ("us", pytest.raises(ValueError))])
+@pytest.mark.parametrize("unit", [
+    "d", "m", "s", "us"])
+@pytest.mark.parametrize("sign", [


there are related tests for this pls move.

jreback · 2018-06-12T11:11:32Z

pandas/tests/scalar/timedelta/test_construction.py

+    +1, -1])
+@pytest.mark.parametrize("num", [
+    0.001, 1, 10])
+def test_string_with_unit(num, sign, unit, redundant_unit, expectation):


this needs equivalent handling for the to_timedelta path (which is separate)

jreback · 2018-06-19T00:09:58Z

can you rebase and fixup

jreback · 2018-07-28T14:38:36Z

can you rebase and update according to comments

jreback · 2018-09-23T21:43:40Z

closing as stale, but would take if rebased / updated

Sup3rGeo · 2018-09-27T20:35:36Z

@jreback sorry for abandoning this, but I will start working back again to finish this PR, especially given that everyone has spent time to review it.

Should I start another one or can I reply to the comments on this one?

gfyoung · 2018-09-27T20:38:50Z

@Sup3rGeo : You can keep pushing to this one. I'll reopen.

Sup3rGeo · 2018-10-01T00:22:26Z

Two points I would like to discuss:

1 - The behaviour so far would raise on Timedelta('3.1416') but not on Timedelta('2000'). Is this expected?

2 - If I raise on any string number without unit parameter, then I get an error in collection time. It appears something else was relying on this behaviour of considering '2000' string as nanoseconds?

_____________ ERROR collecting pandas/tests/util/test_hashing.py ______________
pandas\tests\util\test_hashing.py:13: in <module>
    class TestHashing(object):
pandas\tests\util\test_hashing.py:23: in TestHashing
    Series(pd.timedelta_range('2000', periods=9))])
pandas\core\indexes\timedeltas.py:801: in timedelta_range
    freq=freq, name=name, closed=closed)
pandas\core\indexes\timedeltas.py:194: in __new__
    closed=closed)
pandas\core\indexes\timedeltas.py:231: in _generate_range
    closed=closed)
pandas\core\arrays\timedeltas.py:159: in _generate_range
    start = Timedelta(start)
pandas\_libs\tslibs\timedeltas.pyx:1133: in pandas._libs.tslibs.timedeltas.Timedelta.__new__
    raise ValueError("Cannot convert float string without unit."
E   ValueError: Cannot convert float string without unit. Value: 2000 Type: <type 'str'>
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!
============ 6 skipped, 34363 deselected, 1 error in 73.15 seconds ============

jorisvandenbossche · 2018-10-01T10:24:18Z

The behaviour so far would raise on Timedelta('3.1416') but not on Timedelta('2000'). Is this expected?

You mean the current behaviour in master?
I don't think this is expected.

2 - If I raise on any string number without unit parameter, then I get an error in collection time. It appears something else was relying on this behaviour of considering '2000' string as nanoseconds?

Yes, we used such a construct in one of the tests (in "pandas\tests\util\test_hashing.py:23: in TestHashing"), but you can correct this in the test.

That said, we should probably deprecate it first instead of simply changing.

jorisvandenbossche · 2018-10-01T10:26:02Z

@Sup3rGeo something else, there seem to be a bunch of unrelated changes in the diff. I suppose due something that went wrong in the rebase / update.
Can you try:

git fetch upstream
git merge upstream/master
git push origin bugfix-string-timedelta

to see if that fixes it?

Sup3rGeo · 2018-10-01T11:56:28Z

Hi!

Yes, we used such a construct in one of the tests (in "pandas\tests\util\test_hashing.py:23: in TestHashing"), but you can correct this in the test.

That said, we should probably deprecate it first instead of simply changing.

Sorry but what do you mean by deprecating it instead of changing?

Can you try:

git fetch upstream
git merge upstream/master
git push origin bugfix-string-timedelta

to see if that fixes it?

Did it. Did it help? I think I screwed up with the rebase indeed. I could try starting fresh with a new branch if nothing helps.

…uous.

jorisvandenbossche · 2018-10-01T13:16:42Z

Did it. Did it help? I think I screwed up with the rebase indeed. I could try starting fresh with a new branch if nothing helps.

Apparently not .. I also tried to rebase, and removed the commits that were not yours. That seemed to have fixed some of the problems, but not all (there are still changes in timedeltas.pyx in the diff that are unrelated).

So maybe starting from a fresh branch and copying over your changes there might be the easiest to do.

Sorry but what do you mean by deprecating it instead of changing?

So currently pd.Timedelta("2000") works (the string being interpreted as an integer), so we could keep that working for now, but give the user a warning that this will change in the future to raise an error.
See also http://pandas-docs.github.io/pandas-docs-travis/contributing.html#backwards-compatibility, although that is more about deprecating a full method/function, while here you would be deprecating only a specific behaviour (but some of the explanation still holds)

Sup3rGeo · 2018-10-07T16:30:57Z

Closing in favor of #23025

gfyoung added Bug Timedelta Timedelta data type labels Jun 8, 2018

gfyoung reviewed Jun 8, 2018

View reviewed changes

gfyoung approved these changes Jun 9, 2018

View reviewed changes

jorisvandenbossche reviewed Jun 12, 2018

View reviewed changes

jorisvandenbossche changed the title ~~Accepts integer/float string with units and raises when unit is ambig…~~ Accepts integer/float string with units and raises when unit is ambiguous Jun 12, 2018

jreback requested changes Jun 12, 2018

View reviewed changes

jreback closed this Sep 23, 2018

gfyoung reopened this Sep 27, 2018

Sup3rGeo force-pushed the bugfix-string-timedelta branch from 881c122 to 45dfd7c Compare October 1, 2018 00:18

jorisvandenbossche force-pushed the bugfix-string-timedelta branch from e92da2d to 4a6ed2e Compare October 1, 2018 12:53

Victor added 7 commits October 1, 2018 14:54

Accepts integer/float string with units and raises when unit is ambig…

7ab9279

…uous.

Fixed PEP linting.

98a41f4

Updated function signature in pxd file.

23a9834

Passing tests.

d34140d

Fixed linting.

a2d68e6

Removed excess space LINT

521857c

Fixed long line.

d8b7127

Victor added 15 commits October 1, 2018 14:54

Accepts integer/float string with units and raises when unit is ambig…

0ddbcec

…uous.

Updated function signature in pxd file.

ce94192

Passing tests.

c590092

Moved whatsnew to 0.24.0.

bb08fc4

Reverted parse_timedelta_string signature.

2c46d73

Checking float with units in Timedelta class.

298a33c

Reverted test with exceptions for strings without unit.

677a8f8

Useing null context managers from contextlib.

a371472

Added null context manager for parametrizing tests on exception raising.

4669324

Using null context manager from pandas.util.testing.

93602ac

Raise when defining units in string and constructor.

28ffc98

Simplified and added to_timedelta to test.

654a0ea

Added debug info to exceptions.

f38b4cb

Null context manager implementation.

e322630

Updated test.

85f04c9

jorisvandenbossche force-pushed the bugfix-string-timedelta branch from 4a6ed2e to 85f04c9 Compare October 1, 2018 12:55

Sup3rGeo mentioned this pull request Oct 7, 2018

Accepts integer/float string with units and raises when unit is ambiguous (2) #23025

Closed

4 tasks

Sup3rGeo closed this Oct 7, 2018

Sup3rGeo deleted the bugfix-string-timedelta branch October 7, 2018 17:07



		@pytest.mark.parametrize("redundant_unit, expectation", [
		("", not_raises()),

Accepts integer/float string with units and raises when unit is ambiguous #21384

Accepts integer/float string with units and raises when unit is ambiguous #21384

Conversation

Sup3rGeo commented Jun 8, 2018 • edited Loading

pep8speaks commented Jun 8, 2018 • edited Loading

Comment last updated on October 01, 2018 at 12:55 Hours UTC

Choose a reason for hiding this comment

Sup3rGeo Jun 8, 2018 • edited Loading

Choose a reason for hiding this comment

gfyoung Jun 8, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jun 8, 2018 • edited Loading

Codecov Report

gfyoung Jun 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 19, 2018

jreback commented Jul 28, 2018

jreback commented Sep 23, 2018

Sup3rGeo commented Sep 27, 2018

gfyoung commented Sep 27, 2018

Sup3rGeo commented Oct 1, 2018

jorisvandenbossche commented Oct 1, 2018

jorisvandenbossche commented Oct 1, 2018

Sup3rGeo commented Oct 1, 2018

jorisvandenbossche commented Oct 1, 2018

Sup3rGeo commented Oct 7, 2018

Sup3rGeo commented Jun 8, 2018 •

edited

Loading

pep8speaks commented Jun 8, 2018 •

edited

Loading

Sup3rGeo Jun 8, 2018 •

edited

Loading

gfyoung Jun 8, 2018 •

edited

Loading

codecov bot commented Jun 8, 2018 •

edited

Loading

gfyoung Jun 8, 2018 •

edited

Loading