-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Warn when creating Period
with a string that includes timezone information
#47990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pytest.mark.parametrize("freq", ["D", "W", "M", "Q", "A"]) | ||
@pytest.mark.parametrize("freq", ["D", "M", "Q", "A"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "W" here tests for functionality that is not properly implemented in Pandas. Namely creating a Period
from a string representing a week such as "2017-01-23/2017-01-29"
. This only works by chance and doesn't work for weeks in the 60s, 70s, 80s, 90s of any century or any year from 2360 onwards. This is because dateutil.parser.parse
interprets the second year as hours and minutes.
This is a bit sad because the given string is Pandas' string representation of a week period and it would be nice to have bi-directionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you address this in a separate PR just to keep this PR tightly scoped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could but doing so would stall this PR until the "W" PR is merged. The two test cases I changed raise the warning introduced in this PR. How should we proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I fully understand where and how a "W"
frequency in particular is impacted by this new warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two tests that I propose to change create a Period
, convert it to a str
, and then try to create a Period
again from the string while also telling the constructor what the freq
is. Such a functionality isn't deliberately implemented anywhere for "W", the code path goes to dateutil.parser.parse
which accepts these "W" period strings only if the year can be interpreted as a time. I've explained this in the comment above and also in #48000 - there's a section on weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this appears to be an issue, but I don't understand why these "W" changes need to made in the same PR as adding the warning for Periods with timezones since they seem independent. Having them as separate PRs makes tracking issues easier in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only applies to "W" because it has to do with what the string representation of a week period is and how it's parsed. Some part of that week string is interpreted as a timezone. If I remember correctly, it's the hyphen and the day (-29
in the example). dateutil.parser.parse
reads that as a timezone offset of negative 29 hours. Converting back to a Period
will throw the warning introduced in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay thanks. That explanation makes sense.
But it would be good to still keep testing "W"
even if it works by chance. (but maybe include a comment why it works by chance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And ignore the warning in the test or what? The CI requires that the test suite doesn't throw warnings, no? It sounds like implementing that would make the test a little complex. I'd have to special-case the "W", so the warning is only ignored there and not for the other frequencies.
In general, what's the point of testing for undocumented & broken functionality like this? Do we have reason to believe that library users are relying on parsing these kind of strings? To me the conversion from Period to String and back seems like a hack to write those tests quickly. Is this something we need to support going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use pytest.param("W", marks=pytest.mark.filterwarnings(...))
in the parameterize
.
Generally, we have seen all sorts of undocumented use cases and have gotten issues when those behaviors "broke". I imagine this should be supported eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we
don't need a special warning here
just use the standard UserWarning
Done. I was just thinking it's not good UX if we raise |
@jreback can this be merged? I implemented your requested change the day I opened the merge request. |
doc/source/whatsnew/v1.4.4.rst
Outdated
@@ -43,6 +43,7 @@ Bug fixes | |||
- Bug in :meth:`DataFrameGroupBy.value_counts` where ``subset`` had no effect (:issue:`44267`) | |||
- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`) | |||
- Bug in the :meth:`Series.dt.strftime` accessor return a float instead of object dtype Series for all-NaT input, which also causes a spurious deprecation warning (:issue:`45858`) | |||
- :class:`Period` now raises a warning when created with data that contains timezone information. This is necessary because :class:`Period`, :class:`PeriodArray` and :class:`PeriodIndex` do not support timezones and hence drop any timezone information used when creating them. (:issue:`47005`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 1.5 this is not a change we would backport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 1.5 this is not a change we would backport
I don't understand how the versioning works. Should this go into 1.5.3 or 1.6.0? I put it in v1.5.3.rst
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer the 'W'
changes to be done in a separate PR to keep this one tightly scoped
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
I'm still interested in working on this but I'm swamped with other things right now. I'll get to it later this year. |
Hi @joooeey - is this still active? Shall we close for now and can revisit when you have time? |
Hi @MarcoGorelli! I might have time to do all the changes from my side this weekend or on Tuesday. If I don't get to it, I'll close the PR next week (for the time being). |
I couldn't pin down why the tests think, the stacklevel is wrong. In the stock python interpreter, the warning message refers to the correct line.
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
I think the "W" parts of this have recently been fixed separately, would be good to see this revived. @joooeey up for it? |
In principle, I'm up for it. I've kept this on my backlog all this time. But I can't give a timeline. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Note on the newDateTimeWarning
I added theDateTimeWarning
for this specific case and the warnings raised in #22549 because the commonly usedUserWarning
is often not appropriate. There are lots of conceivable situations where this warning may be raised due to design choices in Pandas as opposed to user error.DateTimeWarning
inherits fromUserWarning
to remain backwards compatible. This trail of thought applies to much of the datetime functionality in Pandas but that's beyond the scope of this PR.