BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50319

MarcoGorelli · 2022-12-17T16:00:23Z

closes BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50317 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

MarcoGorelli · 2022-12-17T16:01:30Z

pandas/_libs/tslibs/parsing.pyx

+    if re.search(r"\d*\.\d+", token) is None:
+        # For example: 98
        token_filled = token.zfill(padding)
    else:
+        # For example: 00.123
        seconds, nanoseconds = token.split(".")
        seconds = f"{int(seconds):02d}"


previously,

seconds = f"{int(seconds):02d}"

was failing if token was just a single character '.'

If we make the check more precise, then we avoid that issue

mroeschke

LGTM. Doesn't need a whatsnew right?

jbrockmendel · 2022-12-17T19:07:19Z

pandas/_libs/tslibs/parsing.pyx

@@ -1016,9 +1016,11 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:

 cdef str _fill_token(token: str, padding: int):
    cdef str token_filled
-    if "." not in token:
+    if re.search(r"\d*\.\d+", token) is None:


is the leading \d* needed here?

\d+ would be better actually, thanks

This is for when it corresponds with the %S.%f directive

MarcoGorelli · 2022-12-17T19:43:06Z

LGTM. Doesn't need a whatsnew right?

thanks! and no, this wouldn't (yet) make a user-facing difference

it fixes #2586 by accident, but I'd say that that fits into the PDEP4-related part of the whatsnew (as in, PDEP4 should have fixed that, it just didn't quite because of an error in implementation here)

mroeschke

LGTM. Merge when ready @jbrockmendel

jbrockmendel · 2022-12-20T19:13:10Z

pandas/_libs/tslibs/parsing.pyx

        token_filled = token.zfill(padding)
    else:
+        # For example: 00.123
        seconds, nanoseconds = token.split(".")


is there a risk that this raises if there is a "." but not the regex?

yup, you'd get

In [6]: token = '.' In [7]: seconds, nanoseconds = token.split(".") In [8]: seconds = f"{int(seconds):02d}" --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In [8], line 1 ----> 1 seconds = f"{int(seconds):02d}" ValueError: invalid literal for int() with base 10: ''

which is what happens in the string in the linked issue ('27.03.2003 14:55:00.000')

i take it this is caught elsewhere so isnt a problem?

yeah, if the padding was 2 (say) then '.'.zfill(2) would give '0.', and in

pandas/pandas/_libs/tslibs/parsing.pyx

Lines 945 to 946 in f7979be

token_filled = _fill_token(tokens[i], padding)

if token_format is None and token_filled == parsed_formatted:

it wouldn't match parsed_formatted

MarcoGorelli · 2022-12-22T21:20:07Z

merging as there's been an approval, and this is quite minor, and it unblocks another PR - if there's any other concerns / feedback I can address them in a follow-up

fix parsing

3aa9d23

MarcoGorelli commented Dec 17, 2022

View reviewed changes

MarcoGorelli requested review from mroeschke and jbrockmendel December 17, 2022 16:02

MarcoGorelli added Bug Datetime Datetime data dtype labels Dec 17, 2022

MarcoGorelli mentioned this pull request Dec 17, 2022

BUG: add date_format to read_csv / Date parsing mistake. read_csv #2586

Closed

stricter regex

b2f5bf3

MarcoGorelli mentioned this pull request Dec 17, 2022

BUG: add date_format to read_csv / Date parsing mistake. read_csv #50320

Merged

mroeschke reviewed Dec 17, 2022

View reviewed changes

jbrockmendel reviewed Dec 17, 2022

View reviewed changes

MarcoGorelli requested review from mroeschke and jbrockmendel December 18, 2022 09:03

mroeschke added this to the 2.0 milestone Dec 18, 2022

mroeschke approved these changes Dec 18, 2022

View reviewed changes

jbrockmendel reviewed Dec 20, 2022

View reviewed changes

MarcoGorelli requested a review from jbrockmendel December 21, 2022 21:28

MarcoGorelli merged commit cc478c4 into pandas-dev:main Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50319

BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50319

MarcoGorelli commented Dec 17, 2022

MarcoGorelli Dec 17, 2022 •

edited

Loading

mroeschke left a comment

jbrockmendel Dec 17, 2022

MarcoGorelli Dec 17, 2022

MarcoGorelli commented Dec 17, 2022

mroeschke left a comment

jbrockmendel Dec 20, 2022

MarcoGorelli Dec 20, 2022

jbrockmendel Dec 20, 2022

MarcoGorelli Dec 20, 2022

MarcoGorelli commented Dec 22, 2022

	token_filled = _fill_token(tokens[i], padding)
	if token_format is None and token_filled == parsed_formatted:

BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50319

BUG guess_Datetime_format doesn't guess 27.03.2003 14:55:00.000 #50319

Conversation

MarcoGorelli commented Dec 17, 2022

MarcoGorelli Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

jbrockmendel Dec 17, 2022

Choose a reason for hiding this comment

MarcoGorelli Dec 17, 2022

Choose a reason for hiding this comment

MarcoGorelli commented Dec 17, 2022

mroeschke left a comment

Choose a reason for hiding this comment

jbrockmendel Dec 20, 2022

Choose a reason for hiding this comment

MarcoGorelli Dec 20, 2022

Choose a reason for hiding this comment

jbrockmendel Dec 20, 2022

Choose a reason for hiding this comment

MarcoGorelli Dec 20, 2022

Choose a reason for hiding this comment

MarcoGorelli commented Dec 22, 2022

MarcoGorelli Dec 17, 2022 •

edited

Loading