Fix unrecognized 'Z' UTC designator #8832

broessli · 2014-11-16T14:29:01Z

jreback · 2014-11-16T14:37:05Z

pandas/tseries/tests/test_tslib.py

@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
        # One us more than the maximum is an error
        self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

+    def test_utc_z_designator(self):
+        self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')


hah...you took out the tests? are they just not legit/undefined? (its ok, just trying to see)

Yes indeed since checking ISO 8601 and RFC 3339 I don't think ...Z0 ...Z00 are actually legit

ok, then they should raise, yes? so maybe the parsing actually needs to be a bit more strict, e.g. if you see a Z, then it must be the end of the string OR have a full-format offset.

Totally agree and as I understand it this is exactly the behavior of the parser in np_datetime_strings.c but there is currently a fallback (dateutil's parser) if anything goes wrong and this is what we are actually testing with ...Z0 and ...Z00. I think this fallback is there for a reason so I did not want to mess with it

ahh, so the fallback was producing an incorrect result (e.g. dateutil). not surprised. Ok so that we KNOW that these 2 cases are not legit. Is it easy to catch these 2 cases? (eg. either you have Z then end-of-string, or Z legit value?) (maybe just easy to simply test if 0/00 are present after Z and raise).

after line 800 where the sublen is 1 you are done (e.g. you got a Z only). Otherwise it passes thru. I would then check the next 1 and then 2 characters (or if you can't its an error ) if they are 0 then its an error, otherwise pass thru.

No it does not pass thru, check line 880, there is a goto parse_error if anything remains after Z is encountered. parse_error raises a PyExc_ValueError but this ValueError is catched in tslib.pyx/convert_to_tsobject(), hence the problem

ok, so check there. These are cases that should be cause then as errors.

You mean adding checks in convert_to_tsobject? Sorry if I am a bit nitpicky
here but I think all those checks should be the responsibility of the
parser only and should not be spread across modules. That's why I am a bit
reluctant to do it. I think the cleanest way to do it currently is to allow
the internal parser to somehow bypass the fallback and be able to raise
errors of its own: if I am correct currently the internal parser raises the
same kind of exception (ValueError) for two different things: legit but
unsupported (not implemented) iso 8601 strings on one side and real ill
formed datetime strings on the other.

Le dimanche 16 novembre 2014, jreback [email protected] a écrit :

In pandas/tseries/tests/test_tslib.py:

@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

def test_utc_z_designator(self):

self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')

ok, so check there. These are cases that should be cause then as errors.

—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8832/files#r20410546.

Benoît.

not sure where you got the idea that I was suggesting you modify convert_to_tsobject. That's very complicated and not warranted. I suggest that you make a simple modification in the same file you are currently working, np_datetime_strings.c to handle the case of reading 1 and 2 characters past the Z.

I would maybe make out_local==-1 an error, then in the except block, if out_local is -1 you can simply re-raise the ValueError (in convert_to_tsobject)

or you can put a specific message in the ValueError which is checked in convert_to_tsobject. You are right, the problem is a ValueError actually means 2 things here. Need to disambiguate them (or allow your change, and simply disregard the 'error' cases that we had before).

broessli · 2014-11-16T15:07:00Z

This time PR is green ;-)

jreback · 2014-11-21T20:40:31Z

@broessli update on this?

broessli · 2014-11-22T07:01:46Z

@jreback I've been very busy and I think some thoughts and refactoring are
in order to fix all this properly (I don't like the idea of reusing
out_local), but unfortunately I don't have time to do it right now. I've
corrected the behavior of the internal parser regarding 'Z' specifier, but
the issue raised here has to do with the way the internal and
fallback/external parser are interacting which to me needs some non trivial
refactoring

Le vendredi 21 novembre 2014, jreback [email protected] a écrit :

@broessli https://github.com/broessli update on this?

—
Reply to this email directly or view it on GitHub
#8832 (comment).

Benoît.

jreback · 2014-11-24T12:41:40Z

@broessli ok, pls squash / rebase. Let's push thru this change as is, then pls add another issue (which I''ll mark for the future), to possibly handle the 2 error cases that we discussed above (list them in the top of the issue so you or whomever gets to it won't forget).

thanks

broessli · 2014-11-27T17:32:42Z

@jreback rebase done, PR green, sorry for the delay, have been away with no access to my home computer

Fix unrecognized 'Z' UTC designator

jreback · 2014-11-27T17:48:15Z

thanks!

jreback added Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype labels Nov 16, 2014

jreback added this to the 0.15.2 milestone Nov 16, 2014

jreback reviewed Nov 16, 2014
View reviewed changes

Fix unrecognized 'Z' UTC designator

1e5d25a

broessli force-pushed the z-utc branch from 7348b65 to 1e5d25a Compare November 27, 2014 16:38

broessli mentioned this pull request Nov 27, 2014

TST/BUG: Improve error handling when parsing iso 8601 strings #8910

Closed

jreback added a commit that referenced this pull request Nov 27, 2014

Merge pull request #8832 from broessli/z-utc

9fd99d7

Fix unrecognized 'Z' UTC designator

jreback merged commit 9fd99d7 into pandas-dev:master Nov 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unrecognized 'Z' UTC designator #8832

Fix unrecognized 'Z' UTC designator #8832

broessli commented Nov 16, 2014

jreback Nov 16, 2014

broessli Nov 16, 2014

jreback Nov 16, 2014

broessli Nov 16, 2014

jreback Nov 16, 2014

jreback Nov 16, 2014

broessli Nov 16, 2014

jreback Nov 16, 2014

broessli Nov 16, 2014

jreback Nov 17, 2014

broessli commented Nov 16, 2014

jreback commented Nov 21, 2014

broessli commented Nov 22, 2014

jreback commented Nov 24, 2014

broessli commented Nov 27, 2014

jreback commented Nov 27, 2014

Fix unrecognized 'Z' UTC designator #8832

Fix unrecognized 'Z' UTC designator #8832

Conversation

broessli commented Nov 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

broessli commented Nov 16, 2014

jreback commented Nov 21, 2014

broessli commented Nov 22, 2014

jreback commented Nov 24, 2014

broessli commented Nov 27, 2014

jreback commented Nov 27, 2014