Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333
Changes from 13 commits
45f82c3
e473adb
a6ea6d0
8fede1f
2e21e71
afc4d96
12721b0
0531967
a571753
e814a2e
19c34f8
70fb820
eb50dfb
ac61ac5
0dd7407
4d35ea7
3acfdf6
f3060c9
7310e13
b18ade7
3ceb1ee
bde5ef9
080f018
031e0e3
01e8bd1
c1e6bc2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to this but if you want to work more with the C codebase I think we should just replace
get_c_string_buf_and_size
withPyUnicode_AsUTF8AndSize
directly; the former might have served a purpose with the Py2/3 transition but is just an unnecessary layer at this pointThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure about this function. What is the advantage of this over usingstrncmp
? Seems like it would be cleaner to just use that plus some combination ofexact
outside of thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think I understand a bit more now what you are trying to do. I think it would be best if you just kept a local variable for characters consumed, which you can increment every time you increment the format pointer (you are kind of doing this anyway). You don't really need a function but can rather just do
if (format++ != '%') { // handle error }
The order of operations will do the postfix increment after assignment, so this consolidates what you are trying to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking I've understood, is the suggestion to get rid of the function and each time do something like
?
I presume I've misunderstood because to me this looks more complicated and loses the docstring 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to even do everything in one single if statement
if ((exact && !format_len) || (format_len && format_len-- && *format++ != ch)) goto ...
, but it then starts to look as one of these tricky C language questions... This explicit function has comments, it is more clear imho, and is very likely to get inlined.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a matter of inlining as much as code organization. For instance, this block:
Would seem more naturally expressed as:
There's some code golf you can do therein to shorten it but that's not really what I'm after. Moreso we should refactor this to avoid trying to cram a lot into a function with side effects, as we don't use that pattern much if at all in our code base, and a lot of the character by character checks being done are overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other advantage of this is you could provide an actual error message as to what went wrong at what position of the format, though not critical to scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Can't we just refactor everything and replace it with something like
?
Unless doing everything in one go is a performance decision...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see a better way to refactor I am definitely open to it. I don't think the performance here will be that critical.
Probably better served as a pre cursor before adding functionality here, or alternately as a follow up to the "side-effect-less" design mentioned already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but can you run
clang-format
against this file? I think it would format this differently