-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix/empty string datetimelike conversion/issue 36550 #36834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/empty string datetimelike conversion/issue 36550 #36834
Conversation
why this change? likely to break test parsing generally and would have to see a lot of benefit |
Considering empty string equivalent to NaT as a huge influence on all method that try to cast data into datetimelike (happen in groupby, merge, join, equal, probably others methods) For information the broken test were the following:
If empty string should be always considered as NaT, then I'll close this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this affect read_csv
? do we have tests for this?
this needs a fairly big section i the whatsnew as this can break existing code
@@ -104,8 +104,13 @@ def test_identity(klass, value): | |||
@pytest.mark.parametrize("klass", [Timestamp, Timedelta, Period]) | |||
@pytest.mark.parametrize("value", ["", "nat", "NAT", None, np.nan]) | |||
def test_equality(klass, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead remove the empty and put this in a new test for that case (and assert the error)
these are duplicating the constructor tests so should have either / or (i think they are fine here)
does this actually close the referenced issue? |
can you merge master |
Closing for now. @nrebena let us know if you'd like to start this up again and we'll reopen |
.isin({pd.NaT, ...})
gives all-False #36550black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Empty string conversion
The actual comportment of Timedelta, Timestamp and to_datetime is to return NaT.
I would propose to raise a ValueError instead, same as for string that are not datetimelike, as follow:
Tests
I added the relevant test for the expected comportment.
I still will have to fix some test that rely on the current behaviors.
Benchmark
I ran the following command
asv continuous -E virtualenv -f 1.1 upstream/master HEAD -b "^tslibs.(timestamp|timedelta|tslib)"