Skip to content

Fix unrecognized 'Z' UTC designator #8832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2014
Merged

Conversation

broessli
Copy link

closes #8771

@jreback jreback added Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype labels Nov 16, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 16, 2014
@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

def test_utc_z_designator(self):
self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah...you took out the tests? are they just not legit/undefined? (its ok, just trying to see)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed since checking ISO 8601 and RFC 3339 I don't think ...Z0 ...Z00 are actually legit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then they should raise, yes? so maybe the parsing actually needs to be a bit more strict, e.g. if you see a Z, then it must be the end of the string OR have a full-format offset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree and as I understand it this is exactly the behavior of the parser in np_datetime_strings.c but there is currently a fallback (dateutil's parser) if anything goes wrong and this is what we are actually testing with ...Z0 and ...Z00. I think this fallback is there for a reason so I did not want to mess with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, so the fallback was producing an incorrect result (e.g. dateutil). not surprised. Ok so that we KNOW that these 2 cases are not legit. Is it easy to catch these 2 cases? (eg. either you have Z then end-of-string, or Z legit value?) (maybe just easy to simply test if 0/00 are present after Z and raise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after line 800 where the sublen is 1 you are done (e.g. you got a Z only). Otherwise it passes thru. I would then check the next 1 and then 2 characters (or if you can't its an error ) if they are 0 then its an error, otherwise pass thru.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not pass thru, check line 880, there is a goto parse_error if anything remains after Z is encountered. parse_error raises a PyExc_ValueError but this ValueError is catched in tslib.pyx/convert_to_tsobject(), hence the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so check there. These are cases that should be cause then as errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean adding checks in convert_to_tsobject? Sorry if I am a bit nitpicky
here but I think all those checks should be the responsibility of the
parser only and should not be spread across modules. That's why I am a bit
reluctant to do it. I think the cleanest way to do it currently is to allow
the internal parser to somehow bypass the fallback and be able to raise
errors of its own: if I am correct currently the internal parser raises the
same kind of exception (ValueError) for two different things: legit but
unsupported (not implemented) iso 8601 strings on one side and real ill
formed datetime strings on the other.

Le dimanche 16 novembre 2014, jreback [email protected] a écrit :

In pandas/tseries/tests/test_tslib.py:

@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)

  • def test_utc_z_designator(self):
  •    self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')
    

ok, so check there. These are cases that should be cause then as errors.


Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8832/files#r20410546.

Benoît.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where you got the idea that I was suggesting you modify convert_to_tsobject. That's very complicated and not warranted. I suggest that you make a simple modification in the same file you are currently working, np_datetime_strings.c to handle the case of reading 1 and 2 characters past the Z.

I would maybe make out_local==-1 an error, then in the except block, if out_local is -1 you can simply re-raise the ValueError (in convert_to_tsobject)

or you can put a specific message in the ValueError which is checked in convert_to_tsobject. You are right, the problem is a ValueError actually means 2 things here. Need to disambiguate them (or allow your change, and simply disregard the 'error' cases that we had before).

@broessli
Copy link
Author

This time PR is green ;-)

@jreback
Copy link
Contributor

jreback commented Nov 21, 2014

@broessli update on this?

@broessli
Copy link
Author

@jreback I've been very busy and I think some thoughts and refactoring are
in order to fix all this properly (I don't like the idea of reusing
out_local), but unfortunately I don't have time to do it right now. I've
corrected the behavior of the internal parser regarding 'Z' specifier, but
the issue raised here has to do with the way the internal and
fallback/external parser are interacting which to me needs some non trivial
refactoring

Le vendredi 21 novembre 2014, jreback [email protected] a écrit :

@broessli https://github.com/broessli update on this?


Reply to this email directly or view it on GitHub
#8832 (comment).

Benoît.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

@broessli ok, pls squash / rebase. Let's push thru this change as is, then pls add another issue (which I''ll mark for the future), to possibly handle the 2 error cases that we discussed above (list them in the top of the issue so you or whomever gets to it won't forget).

thanks

@broessli
Copy link
Author

@jreback rebase done, PR green, sorry for the delay, have been away with no access to my home computer

jreback added a commit that referenced this pull request Nov 27, 2014
Fix unrecognized 'Z' UTC designator
@jreback jreback merged commit 9fd99d7 into pandas-dev:master Nov 27, 2014
@jreback
Copy link
Contributor

jreback commented Nov 27, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp does not parse the 'Z' zone designator for UTC
2 participants