CI: avoid `guess_datetime_format` failure on 29th of Feburary #57674

MarcoGorelli · 2024-02-29T09:19:22Z

closes BUG: List of years (as string) raises UserWarning with to_datetime #57672 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

MarcoGorelli · 2024-02-29T09:26:00Z

pandas/_libs/tslibs/parsing.pyx

-    # same default used by dateutil
-    default = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
+    default = datetime(1970, 1, 1)


the guessed format is checked explicitly against the input:

pandas/pandas/_libs/tslibs/parsing.pyx

Lines 1050 to 1061 in 7bdb6d8

try:

array_strptime(np.asarray([dt_str], dtype=object), guessed_format)

except ValueError:

# Doesn't parse, so this can't be the correct format.

return None

# rebuild string, capturing any inferred padding

dt_str = "".join(tokens)

if parsed_datetime.strftime(guessed_format) == dt_str:

_maybe_warn_about_dayfirst(guessed_format, dayfirst)

return guessed_format

else:

return None

so this should be safe

worth adding a comment about why this default was chosen?

sure, thanks

jbrockmendel · 2024-02-29T17:58:28Z

Looks reasonable. I guess it would be a PITA to write a test?

MarcoGorelli · 2024-02-29T18:58:56Z

Looks reasonable. I guess it would be a PITA to write a test?

given that the function calls datetime.now from Cython - yes, probably

datapythonista · 2024-02-29T21:02:02Z

pandas/_libs/tslibs/parsing.pyx

+    # Use this instead of the dateutil default of
+    # `datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)`
+    # as that causes issues on the 29th of February.
+    default = datetime(1970, 1, 1)


I fail to understand what's going on. Surely happy to get this merged if it fixes the problem and you understand what's going on.

The root problem seems to be that guess_datetime_format('2003') is None instead of %Y. That would be easy to test if we want. What I don't understand is why changing the day and the month in the parsed date is relevant if the only token we have is the year...

>>> dateutil.parser.parse('2003', dayfirst=True, default=datetime.datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)) datetime.datetime(2003, 2, 28, 0, 0) >>> dateutil.parser.parse('2003', dayfirst=True, default=datetime.datetime(1970, 1, 1)) datetime.datetime(2003, 1, 1, 0, 0)

I didn't debug it, I guess you did, but I can't see why in the implementation it should make a difference. As said, if you are confident that this is a good solution and the algorithm is relivable happy to get this merged. But seems like there is a bug in the algorithm below we are not preventing/hacking, no?

yeah seems like there's something in pandas' vendored code which makes dateutil_parse fail, even though non-vendored dateutil itself parses it. that's the problem with vendoring I guess...

@datapythonista - currently in main we take today's date, replace it with the bits that the user passed, and then check if it's a valid date. E.g. "2002" on 2/29 becomes "2002-02-29" which is not a valid date. From this we conclude we've somehow guessed wrong.

Thanks both. I see, I assumed dateutil_parse was an alias to the dateutil parser. I see the copied version has the comment lifted from dateutil to get resolution, maybe the dateutil parser works at second resolution and we work at microsecond or nanosecond, and that's why the function was copied to pandas?

In any case I think the fix is fine, a function to get the format from a partial date is surely not going to be perfect anyway.

mroeschke · 2024-03-05T00:51:30Z

Looks good. Could you double check this typing issue? https://github.com/pandas-dev/pandas/actions/runs/8108693644/job/22162385490?pr=57674

MarcoGorelli · 2024-03-05T18:39:50Z

thanks - that one should be addressed in #57689

mroeschke

LGTM. Going to retarget for 3.0 since the next time this will fail is in 4 years by which 2.2.x will be unsupported :)

mroeschke · 2024-03-05T20:14:59Z

Thanks @MarcoGorelli

…-dev#57674) * try fix ci * add comment

try fix ci

7bdb6d8

MarcoGorelli force-pushed the fix-ci branch from 6ab62da to 7bdb6d8 Compare February 29, 2024 09:20

MarcoGorelli commented Feb 29, 2024

View reviewed changes

MarcoGorelli marked this pull request as ready for review February 29, 2024 09:57

MarcoGorelli added the CI Continuous Integration label Feb 29, 2024

MarcoGorelli added this to the 2.2.2 milestone Feb 29, 2024

MarcoGorelli changed the title ~~Fix ci~~ CI: avoid guess_datetime_format failure on 29th of Feburary Feb 29, 2024

MarcoGorelli requested review from jbrockmendel and mroeschke February 29, 2024 15:04

add comment

d7eab7c

datapythonista reviewed Feb 29, 2024

View reviewed changes

datapythonista mentioned this pull request Feb 29, 2024

BUG: datetime parsing fails on a leapday #57685

Closed

5 tasks

Merge remote-tracking branch 'upstream/main' into fix-ci

a520cfe

Merge branch 'main' into fix-ci

d9c330b

mroeschke modified the milestones: 2.2.2, 3.0 Mar 5, 2024

mroeschke added the Testing pandas testing functions or related to the test suite label Mar 5, 2024

mroeschke approved these changes Mar 5, 2024

View reviewed changes

mroeschke merged commit 83112d7 into pandas-dev:main Mar 5, 2024

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

CI: avoid guess_datetime_format failure on 29th of Feburary (pandas…

cfc93ad

…-dev#57674) * try fix ci * add comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: avoid `guess_datetime_format` failure on 29th of Feburary #57674

CI: avoid `guess_datetime_format` failure on 29th of Feburary #57674

MarcoGorelli commented Feb 29, 2024 •

edited

Loading

MarcoGorelli Feb 29, 2024

jbrockmendel Feb 29, 2024

MarcoGorelli Feb 29, 2024

jbrockmendel commented Feb 29, 2024

MarcoGorelli commented Feb 29, 2024

datapythonista Feb 29, 2024

MarcoGorelli Mar 1, 2024

rhshadrach Mar 1, 2024

datapythonista Mar 1, 2024

mroeschke commented Mar 5, 2024

MarcoGorelli commented Mar 5, 2024

mroeschke left a comment

mroeschke commented Mar 5, 2024

	try:
	array_strptime(np.asarray([dt_str], dtype=object), guessed_format)
	except ValueError:
	# Doesn't parse, so this can't be the correct format.
	return None
	# rebuild string, capturing any inferred padding
	dt_str = "".join(tokens)
	if parsed_datetime.strftime(guessed_format) == dt_str:
	_maybe_warn_about_dayfirst(guessed_format, dayfirst)
	return guessed_format
	else:
	return None

CI: avoid guess_datetime_format failure on 29th of Feburary #57674

CI: avoid guess_datetime_format failure on 29th of Feburary #57674

Conversation

MarcoGorelli commented Feb 29, 2024 • edited Loading

MarcoGorelli Feb 29, 2024

Choose a reason for hiding this comment

jbrockmendel Feb 29, 2024

Choose a reason for hiding this comment

MarcoGorelli Feb 29, 2024

Choose a reason for hiding this comment

jbrockmendel commented Feb 29, 2024

MarcoGorelli commented Feb 29, 2024

datapythonista Feb 29, 2024

Choose a reason for hiding this comment

MarcoGorelli Mar 1, 2024

Choose a reason for hiding this comment

rhshadrach Mar 1, 2024

Choose a reason for hiding this comment

datapythonista Mar 1, 2024

Choose a reason for hiding this comment

mroeschke commented Mar 5, 2024

MarcoGorelli commented Mar 5, 2024

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke commented Mar 5, 2024

CI: avoid `guess_datetime_format` failure on 29th of Feburary #57674

CI: avoid `guess_datetime_format` failure on 29th of Feburary #57674

MarcoGorelli commented Feb 29, 2024 •

edited

Loading