-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix unrecognized 'Z' UTC designator #8832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah...you took out the tests? are they just not legit/undefined? (its ok, just trying to see)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed since checking ISO 8601 and RFC 3339 I don't think
...Z0
...Z00
are actually legitThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then they should raise, yes? so maybe the parsing actually needs to be a bit more strict, e.g. if you see a Z, then it must be the end of the string OR have a full-format offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree and as I understand it this is exactly the behavior of the parser in np_datetime_strings.c but there is currently a fallback (dateutil's parser) if anything goes wrong and this is what we are actually testing with
...Z0
and...Z00
. I think this fallback is there for a reason so I did not want to mess with itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, so the fallback was producing an incorrect result (e.g. dateutil). not surprised. Ok so that we KNOW that these 2 cases are not legit. Is it easy to catch these 2 cases? (eg. either you have Z then end-of-string, or Z legit value?) (maybe just easy to simply test if 0/00 are present after Z and raise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after line 800 where the sublen is 1 you are done (e.g. you got a Z only). Otherwise it passes thru. I would then check the next 1 and then 2 characters (or if you can't its an error ) if they are 0 then its an error, otherwise pass thru.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not pass thru, check line 880, there is a goto parse_error if anything remains after Z is encountered. parse_error raises a PyExc_ValueError but this ValueError is catched in tslib.pyx/convert_to_tsobject(), hence the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so check there. These are cases that should be cause then as errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean adding checks in convert_to_tsobject? Sorry if I am a bit nitpicky
here but I think all those checks should be the responsibility of the
parser only and should not be spread across modules. That's why I am a bit
reluctant to do it. I think the cleanest way to do it currently is to allow
the internal parser to somehow bypass the fallback and be able to raise
errors of its own: if I am correct currently the internal parser raises the
same kind of exception (ValueError) for two different things: legit but
unsupported (not implemented) iso 8601 strings on one side and real ill
formed datetime strings on the other.
Le dimanche 16 novembre 2014, jreback [email protected] a écrit :
Benoît.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where you got the idea that I was suggesting you modify
convert_to_tsobject
. That's very complicated and not warranted. I suggest that you make a simple modification in the same file you are currently working,np_datetime_strings.c
to handle the case of reading 1 and 2 characters past the Z.I would maybe make
out_local==-1
an error, then in the except block, if out_local is -1 you can simply re-raise the ValueError (inconvert_to_tsobject
)or you can put a specific message in the
ValueError
which is checked inconvert_to_tsobject
. You are right, the problem is aValueError
actually means 2 things here. Need to disambiguate them (or allow your change, and simply disregard the 'error' cases that we had before).