Skip to content

Accepts integer/float string with units and raises when unit is ambiguous #21384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

Sup3rGeo
Copy link
Contributor

@Sup3rGeo Sup3rGeo commented Jun 8, 2018

I couldn't get tests to run on my machine so I will see if it passes on the CI servers.

@pep8speaks
Copy link

pep8speaks commented Jun 8, 2018

Hello @Sup3rGeo! Thanks for updating the PR.

Line 55:5: E301 expected 1 blank line, found 0
Line 57:5: E301 expected 1 blank line, found 0

  • Complete extra results for this file :

file_to_check.py:2461:-258: W605 invalid escape sequence '('
file_to_check.py:2461:-255: W605 invalid escape sequence ')'


Comment last updated on October 01, 2018 at 12:55 Hours UTC

@gfyoung gfyoung added Bug Timedelta Timedelta data type labels Jun 8, 2018


@pytest.mark.parametrize("redundant_unit, expectation", [
("", not_raises()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe pytest.raises(None) should suffice.

Copy link
Contributor Author

@Sup3rGeo Sup3rGeo Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't - it is still an open issue in pytest: pytest-dev/pytest#1830
Do you have a better idea than to define this empty context manager?

Copy link
Member

@gfyoung gfyoung Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd that we can have an empty context manager with warnings assertions, but that's not implemented for exceptions. Implementing the empty context manager wouldn't super difficult on our end (we have pandas.util.testing.assert_raises_regex).

Only question is how well-received such an idea would be.

@jreback @jorisvandenbossche : Thoughts?

@codecov
Copy link

codecov bot commented Jun 8, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@a277e4a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #21384   +/-   ##
=========================================
  Coverage          ?   91.89%           
=========================================
  Files             ?      153           
  Lines             ?    49596           
  Branches          ?        0           
=========================================
  Hits              ?    45576           
  Misses            ?     4020           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.29% <ø> (?)
#single 41.86% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a277e4a...85f04c9. Read the comment docs.

0.001, 1, 10])
def test_string_with_unit(num, sign, unit, redundant_unit, expectation):
with expectation:
assert Timedelta(str(sign * num) + redundant_unit, unit=unit)\
Copy link
Member

@gfyoung gfyoung Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't we somehow do multiline with an assert statement without the slash (we generally don't use the slash when doing multiline in this repository)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I could break the whole statement in two, moving the Timedelta first argument to a variable. This would reduce the line length. Should I?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work.

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not_raises question notwithstanding (it's not a blocker unless we decide we want to do this), this looks good to me!

cc @jreback

@@ -59,6 +59,7 @@ Data-type specific

- Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)
- Bug in :class:`Timedelta`: where passing a string of a pure number would not take unit into account. Also raises for ambiguous/duplicate unit specification (:issue: `12136`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to 0.24.0.txt?

@@ -85,10 +85,6 @@ def test_construction():
with pytest.raises(ValueError):
Timedelta('10 days -1 h 1.5m 1s 3us')

# no units specified
with pytest.raises(ValueError):
Timedelta('3.1415')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to allow this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't allow this, pls revert. a string w/o a unit must raise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, looking back at the issue #12136, there was discussion about this and said we would allow this. So that is a good reason this is PR is doing that! :-)

(but personally, I would simply not allow it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing a string with a unit is fine
this does not have a unit

@pytest.mark.parametrize("redundant_unit, expectation", [
("", not_raises()),
("d", pytest.raises(ValueError)),
("us", pytest.raises(ValueError))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to test that it is raising the value error with all the other combinations? (because this gives a combinatorial explosion of number of test .. ;-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I just wanted to make sure kinda all cases were covered. I believe this does not give a lot of combinations, so I thought that for this case specifically it would be ok.

@@ -999,10 +1004,21 @@ class Timedelta(_Timedelta):
if isinstance(value, Timedelta):
value = value.value
elif is_string_object(value):
if len(value) > 0 and value[0] == 'P':
# Check if it is just a number in a string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support this? (do we now?)

@jorisvandenbossche jorisvandenbossche changed the title Accepts integer/float string with units and raises when unit is ambig… Accepts integer/float string with units and raises when unit is ambiguous Jun 12, 2018
@@ -245,7 +245,7 @@ cdef inline _decode_if_necessary(object ts):
return ts


cdef inline parse_timedelta_string(object ts):
cdef inline parse_timedelta_string(object ts, specified_unit=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be handled here at all

@@ -999,10 +1004,21 @@ class Timedelta(_Timedelta):
if isinstance(value, Timedelta):
value = value.value
elif is_string_object(value):
if len(value) > 0 and value[0] == 'P':
# Check if it is just a number in a string
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just try to float here. if it succeeds it must have a unit specified, if not you must raise, e.g. Timedelta('1000') is not accepted

value = parse_iso_format_string(value)
else:
value = parse_timedelta_string(value)
value = parse_timedelta_string(value, unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no not pass here

@@ -85,10 +85,6 @@ def test_construction():
with pytest.raises(ValueError):
Timedelta('10 days -1 h 1.5m 1s 3us')

# no units specified
with pytest.raises(ValueError):
Timedelta('3.1415')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't allow this, pls revert. a string w/o a unit must raise.


class not_raises(object):
def __enter__(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this its not idiomatic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda needed it for parametrizing on the expectation. Should I then just add a if/else in the test to use or not pytest.raises?

("us", pytest.raises(ValueError))])
@pytest.mark.parametrize("unit", [
"d", "m", "s", "us"])
@pytest.mark.parametrize("sign", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are related tests for this pls move.

+1, -1])
@pytest.mark.parametrize("num", [
0.001, 1, 10])
def test_string_with_unit(num, sign, unit, redundant_unit, expectation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs equivalent handling for the to_timedelta path (which is separate)

@jreback
Copy link
Contributor

jreback commented Jun 19, 2018

can you rebase and fixup

@jreback
Copy link
Contributor

jreback commented Jul 28, 2018

can you rebase and update according to comments

@jreback
Copy link
Contributor

jreback commented Sep 23, 2018

closing as stale, but would take if rebased / updated

@jreback jreback closed this Sep 23, 2018
@Sup3rGeo
Copy link
Contributor Author

@jreback sorry for abandoning this, but I will start working back again to finish this PR, especially given that everyone has spent time to review it.

Should I start another one or can I reply to the comments on this one?

@gfyoung
Copy link
Member

gfyoung commented Sep 27, 2018

@Sup3rGeo : You can keep pushing to this one. I'll reopen.

@gfyoung gfyoung reopened this Sep 27, 2018
@Sup3rGeo Sup3rGeo force-pushed the bugfix-string-timedelta branch from 881c122 to 45dfd7c Compare October 1, 2018 00:18
@Sup3rGeo
Copy link
Contributor Author

Sup3rGeo commented Oct 1, 2018

Two points I would like to discuss:

1 - The behaviour so far would raise on Timedelta('3.1416') but not on Timedelta('2000'). Is this expected?

2 - If I raise on any string number without unit parameter, then I get an error in collection time. It appears something else was relying on this behaviour of considering '2000' string as nanoseconds?

_____________ ERROR collecting pandas/tests/util/test_hashing.py ______________
pandas\tests\util\test_hashing.py:13: in <module>
    class TestHashing(object):
pandas\tests\util\test_hashing.py:23: in TestHashing
    Series(pd.timedelta_range('2000', periods=9))])
pandas\core\indexes\timedeltas.py:801: in timedelta_range
    freq=freq, name=name, closed=closed)
pandas\core\indexes\timedeltas.py:194: in __new__
    closed=closed)
pandas\core\indexes\timedeltas.py:231: in _generate_range
    closed=closed)
pandas\core\arrays\timedeltas.py:159: in _generate_range
    start = Timedelta(start)
pandas\_libs\tslibs\timedeltas.pyx:1133: in pandas._libs.tslibs.timedeltas.Timedelta.__new__
    raise ValueError("Cannot convert float string without unit."
E   ValueError: Cannot convert float string without unit. Value: 2000 Type: <type 'str'>
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!
============ 6 skipped, 34363 deselected, 1 error in 73.15 seconds ============

@jorisvandenbossche
Copy link
Member

The behaviour so far would raise on Timedelta('3.1416') but not on Timedelta('2000'). Is this expected?

You mean the current behaviour in master?
I don't think this is expected.

2 - If I raise on any string number without unit parameter, then I get an error in collection time. It appears something else was relying on this behaviour of considering '2000' string as nanoseconds?

Yes, we used such a construct in one of the tests (in "pandas\tests\util\test_hashing.py:23: in TestHashing"), but you can correct this in the test.

That said, we should probably deprecate it first instead of simply changing.

@jorisvandenbossche
Copy link
Member

@Sup3rGeo something else, there seem to be a bunch of unrelated changes in the diff. I suppose due something that went wrong in the rebase / update.
Can you try:

git fetch upstream
git merge upstream/master
git push origin bugfix-string-timedelta

to see if that fixes it?

@Sup3rGeo
Copy link
Contributor Author

Sup3rGeo commented Oct 1, 2018

Hi!

Yes, we used such a construct in one of the tests (in "pandas\tests\util\test_hashing.py:23: in TestHashing"), but you can correct this in the test.

That said, we should probably deprecate it first instead of simply changing.

Sorry but what do you mean by deprecating it instead of changing?

Can you try:

git fetch upstream
git merge upstream/master
git push origin bugfix-string-timedelta

to see if that fixes it?

Did it. Did it help? I think I screwed up with the rebase indeed. I could try starting fresh with a new branch if nothing helps.

@jorisvandenbossche
Copy link
Member

Did it. Did it help? I think I screwed up with the rebase indeed. I could try starting fresh with a new branch if nothing helps.

Apparently not .. I also tried to rebase, and removed the commits that were not yours. That seemed to have fixed some of the problems, but not all (there are still changes in timedeltas.pyx in the diff that are unrelated).

So maybe starting from a fresh branch and copying over your changes there might be the easiest to do.

Sorry but what do you mean by deprecating it instead of changing?

So currently pd.Timedelta("2000") works (the string being interpreted as an integer), so we could keep that working for now, but give the user a warning that this will change in the future to raise an error.
See also http://pandas-docs.github.io/pandas-docs-travis/contributing.html#backwards-compatibility, although that is more about deprecating a full method/function, while here you would be deprecating only a specific behaviour (but some of the explanation still holds)

@Sup3rGeo
Copy link
Contributor Author

Sup3rGeo commented Oct 7, 2018

Closing in favor of #23025

@Sup3rGeo Sup3rGeo closed this Oct 7, 2018
@Sup3rGeo Sup3rGeo deleted the bugfix-string-timedelta branch October 7, 2018 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/API: to_timedelta unit-argument ignored for string input
5 participants