-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix unrecognized 'Z' UTC designator #8832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -298,6 +298,9 @@ def test_barely_oob_dts(self): | |||
# One us more than the maximum is an error | |||
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us) | |||
|
|||
def test_utc_z_designator(self): | |||
self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah...you took out the tests? are they just not legit/undefined? (its ok, just trying to see)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed since checking ISO 8601 and RFC 3339 I don't think ...Z0
...Z00
are actually legit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then they should raise, yes? so maybe the parsing actually needs to be a bit more strict, e.g. if you see a Z, then it must be the end of the string OR have a full-format offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree and as I understand it this is exactly the behavior of the parser in np_datetime_strings.c but there is currently a fallback (dateutil's parser) if anything goes wrong and this is what we are actually testing with ...Z0
and ...Z00
. I think this fallback is there for a reason so I did not want to mess with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, so the fallback was producing an incorrect result (e.g. dateutil). not surprised. Ok so that we KNOW that these 2 cases are not legit. Is it easy to catch these 2 cases? (eg. either you have Z then end-of-string, or Z legit value?) (maybe just easy to simply test if 0/00 are present after Z and raise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after line 800 where the sublen is 1 you are done (e.g. you got a Z only). Otherwise it passes thru. I would then check the next 1 and then 2 characters (or if you can't its an error ) if they are 0 then its an error, otherwise pass thru.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not pass thru, check line 880, there is a goto parse_error if anything remains after Z is encountered. parse_error raises a PyExc_ValueError but this ValueError is catched in tslib.pyx/convert_to_tsobject(), hence the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so check there. These are cases that should be cause then as errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean adding checks in convert_to_tsobject? Sorry if I am a bit nitpicky
here but I think all those checks should be the responsibility of the
parser only and should not be spread across modules. That's why I am a bit
reluctant to do it. I think the cleanest way to do it currently is to allow
the internal parser to somehow bypass the fallback and be able to raise
errors of its own: if I am correct currently the internal parser raises the
same kind of exception (ValueError) for two different things: legit but
unsupported (not implemented) iso 8601 strings on one side and real ill
formed datetime strings on the other.
Le dimanche 16 novembre 2014, jreback [email protected] a écrit :
In pandas/tseries/tests/test_tslib.py:
@@ -298,6 +298,9 @@ def test_barely_oob_dts(self):
# One us more than the maximum is an error
self.assertRaises(ValueError, Timestamp, max_ts_us + one_us)
- def test_utc_z_designator(self):
self.assertEqual(get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC')
ok, so check there. These are cases that should be cause then as errors.
—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8832/files#r20410546.
Benoît.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where you got the idea that I was suggesting you modify convert_to_tsobject
. That's very complicated and not warranted. I suggest that you make a simple modification in the same file you are currently working, np_datetime_strings.c
to handle the case of reading 1 and 2 characters past the Z.
I would maybe make out_local==-1
an error, then in the except block, if out_local is -1 you can simply re-raise the ValueError (in convert_to_tsobject
)
or you can put a specific message in the ValueError
which is checked in convert_to_tsobject
. You are right, the problem is a ValueError
actually means 2 things here. Need to disambiguate them (or allow your change, and simply disregard the 'error' cases that we had before).
This time PR is green ;-) |
@broessli update on this? |
@jreback I've been very busy and I think some thoughts and refactoring are Le vendredi 21 novembre 2014, jreback [email protected] a écrit :
Benoît. |
@broessli ok, pls squash / rebase. Let's push thru this change as is, then pls add another issue (which I''ll mark for the future), to possibly handle the 2 error cases that we discussed above (list them in the top of the issue so you or whomever gets to it won't forget). thanks |
@jreback rebase done, PR green, sorry for the delay, have been away with no access to my home computer |
Fix unrecognized 'Z' UTC designator
thanks! |
closes #8771