-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
45f82c3
initial format support
nikitaved e473adb
set exact=False default in objects_to_datetime
a6ea6d0
:label: typing
8fede1f
simplify
2e21e71
Merge remote-tracking branch 'upstream/main' into pr/nikitaved-qssumm…
afc4d96
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
12721b0
replace macro with function
0531967
clean up
a571753
:memo: restore docstring
e814a2e
inline
19c34f8
set format default to None
70fb820
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
eb50dfb
clean up
ac61ac5
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
0dd7407
remove function, perform check inline
4d35ea7
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
3acfdf6
only compare *format++ if format_len
f3060c9
clean up
7310e13
typing
b18ade7
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
3ceb1ee
split out branches
bde5ef9
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
080f018
use compare_format function
031e0e3
remove tmp variable
01e8bd1
Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…
c1e6bc2
Add co-authors
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,8 @@ cdef extern from "src/datetime/np_datetime_strings.h": | |
int parse_iso_8601_datetime(const char *str, int len, int want_exc, | ||
npy_datetimestruct *out, | ||
NPY_DATETIMEUNIT *out_bestunit, | ||
int *out_local, int *out_tzoffset) | ||
int *out_local, int *out_tzoffset, | ||
const char *format, int format_len, int exact) | ||
|
||
|
||
# ---------------------------------------------------------------------- | ||
|
@@ -273,14 +274,20 @@ cdef inline int string_to_dts( | |
int* out_local, | ||
int* out_tzoffset, | ||
bint want_exc, | ||
str format, | ||
bint exact, | ||
) except? -1: | ||
cdef: | ||
Py_ssize_t length | ||
const char* buf | ||
Py_ssize_t format_length | ||
const char* format_buf | ||
|
||
buf = get_c_string_buf_and_size(val, &length) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Orthogonal to this but if you want to work more with the C codebase I think we should just replace |
||
format_buf = get_c_string_buf_and_size(format, &format_length) | ||
return parse_iso_8601_datetime(buf, length, want_exc, | ||
dts, out_bestunit, out_local, out_tzoffset) | ||
dts, out_bestunit, out_local, out_tzoffset, | ||
format_buf, format_length, exact) | ||
|
||
|
||
cpdef ndarray astype_overflowsafe( | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use None as a default rather than
""
? I think can just type as Optional[str] which would be a bit cleanerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could do that here, but then wouldn't it still need to be
str
when it's passed to the C function inpandas/pandas/_libs/tslibs/np_datetime.pyx
Lines 287 to 290 in 45f82c3
? If I understand correctly, at some point the conversion from
None
to""
would have to happen anywayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If None can be interpreted as nullptr (can we just pass
0
?), I agree that having None is a cleaner option for something that is user-facing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't user-facing, it's just used internally
Not sure I understand about None being interpreted as nullptr, but if I pass
None
toget_c_string_buf_and_size
I getSo I think the conversion between
None
andstr
needs to happen at some point internally anywayIf
exact
if False andformat
is''
, then it should match anythingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about?
The rest of the code should work just fine given that the checks are conditioned on
format_lenght
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, looks good - I presume
format_buf
would have to be'\0'
rather than0
though?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as
format_len
is0
,*format_buf
could be anything, so it is a bit clearer to have the pointer set toNULL
, imho.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how though? because if it's
0
, then we getAnd with
None
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it is a cast issue. Does Cython support casts to C types? If so, something like
format_buf = (const char*) 0
should do. Or, maybe, we can specify the type offormat_buf
in advance... Sorry, I am not an expert in Cython :)Maybe this will work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried
but then get
(with no further info)
For now I've gone with
format = b''
, and handlingNone
inside this function does indeed look cleaner, thanks!