Skip to content

Warn when creating Period with a string that includes timezone information #47990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

joooeey
Copy link
Contributor

@joooeey joooeey commented Aug 6, 2022

Note on the new DateTimeWarning

I added the DateTimeWarning for this specific case and the warnings raised in #22549 because the commonly used UserWarning is often not appropriate. There are lots of conceivable situations where this warning may be raised due to design choices in Pandas as opposed to user error. DateTimeWarning inherits from UserWarning to remain backwards compatible. This trail of thought applies to much of the datetime functionality in Pandas but that's beyond the scope of this PR.

Comment on lines 23 to 25
@pytest.mark.parametrize("freq", ["D", "W", "M", "Q", "A"])
@pytest.mark.parametrize("freq", ["D", "M", "Q", "A"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "W" here tests for functionality that is not properly implemented in Pandas. Namely creating a Period from a string representing a week such as "2017-01-23/2017-01-29". This only works by chance and doesn't work for weeks in the 60s, 70s, 80s, 90s of any century or any year from 2360 onwards. This is because dateutil.parser.parse interprets the second year as hours and minutes.

This is a bit sad because the given string is Pandas' string representation of a week period and it would be nice to have bi-directionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you address this in a separate PR just to keep this PR tightly scoped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but doing so would stall this PR until the "W" PR is merged. The two test cases I changed raise the warning introduced in this PR. How should we proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand where and how a "W" frequency in particular is impacted by this new warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two tests that I propose to change create a Period, convert it to a str, and then try to create a Period again from the string while also telling the constructor what the freq is. Such a functionality isn't deliberately implemented anywhere for "W", the code path goes to dateutil.parser.parse which accepts these "W" period strings only if the year can be interpreted as a time. I've explained this in the comment above and also in #48000 - there's a section on weeks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this appears to be an issue, but I don't understand why these "W" changes need to made in the same PR as adding the warning for Periods with timezones since they seem independent. Having them as separate PRs makes tracking issues easier in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to "W" because it has to do with what the string representation of a week period is and how it's parsed. Some part of that week string is interpreted as a timezone. If I remember correctly, it's the hyphen and the day (-29 in the example). dateutil.parser.parse reads that as a timezone offset of negative 29 hours. Converting back to a Period will throw the warning introduced in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay thanks. That explanation makes sense.

But it would be good to still keep testing "W" even if it works by chance. (but maybe include a comment why it works by chance)

Copy link
Contributor Author

@joooeey joooeey Aug 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ignore the warning in the test or what? The CI requires that the test suite doesn't throw warnings, no? It sounds like implementing that would make the test a little complex. I'd have to special-case the "W", so the warning is only ignored there and not for the other frequencies.

In general, what's the point of testing for undocumented & broken functionality like this? Do we have reason to believe that library users are relying on parsing these kind of strings? To me the conversion from Period to String and back seems like a hack to write those tests quickly. Is this something we need to support going forward?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use pytest.param("W", marks=pytest.mark.filterwarnings(...)) in the parameterize.

Generally, we have seen all sorts of undocumented use cases and have gotten issues when those behaviors "broke". I imagine this should be supported eventually.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we
don't need a special warning here
just use the standard UserWarning

@joooeey
Copy link
Contributor Author

joooeey commented Aug 6, 2022

we don't need a special warning here just use the standard UserWarning

Done.

I was just thinking it's not good UX if we raise UserWarning for things that are not necessarily mistakes by the user which I've seen a few examples. But then there's not really a better Warning in the built-in Exceptions except perhaps Warning.

@joooeey joooeey marked this pull request as draft August 6, 2022 15:07
@joooeey joooeey marked this pull request as ready for review August 6, 2022 22:53
@joooeey joooeey requested a review from jreback August 7, 2022 08:08
@joooeey joooeey mentioned this pull request Aug 7, 2022
3 tasks
@mroeschke mroeschke added Period Period data type Warnings Warnings that appear or should be added to pandas labels Aug 8, 2022
@joooeey
Copy link
Contributor Author

joooeey commented Aug 23, 2022

@jreback can this be merged? I implemented your requested change the day I opened the merge request.

@@ -43,6 +43,7 @@ Bug fixes
- Bug in :meth:`DataFrameGroupBy.value_counts` where ``subset`` had no effect (:issue:`44267`)
- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`)
- Bug in the :meth:`Series.dt.strftime` accessor return a float instead of object dtype Series for all-NaT input, which also causes a spurious deprecation warning (:issue:`45858`)
- :class:`Period` now raises a warning when created with data that contains timezone information. This is necessary because :class:`Period`, :class:`PeriodArray` and :class:`PeriodIndex` do not support timezones and hence drop any timezone information used when creating them. (:issue:`47005`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.5 this is not a change we would backport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.5 this is not a change we would backport

I don't understand how the versioning works. Should this go into 1.5.3 or 1.6.0? I put it in v1.5.3.rst for now.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer the 'W' changes to be done in a separate PR to keep this one tightly scoped

#47990 (comment)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Sep 29, 2022
@joooeey
Copy link
Contributor Author

joooeey commented Sep 29, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

I'm still interested in working on this but I'm swamped with other things right now. I'll get to it later this year.

@MarcoGorelli
Copy link
Member

Hi @joooeey - is this still active? Shall we close for now and can revisit when you have time?

@joooeey
Copy link
Contributor Author

joooeey commented Dec 9, 2022

Hi @MarcoGorelli! I might have time to do all the changes from my side this weekend or on Tuesday. If I don't get to it, I'll close the PR next week (for the time being).

@MarcoGorelli MarcoGorelli self-requested a review December 13, 2022 21:16
I couldn't pin down why the tests think, the stacklevel is wrong. In the stock python interpreter, the warning message refers to the correct line.
@joooeey joooeey marked this pull request as draft December 18, 2022 20:45
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 16, 2023
@jbrockmendel
Copy link
Member

I think the "W" parts of this have recently been fixed separately, would be good to see this revived. @joooeey up for it?

@joooeey
Copy link
Contributor Author

joooeey commented Mar 6, 2023

In principle, I'm up for it. I've kept this on my backlog all this time. But I can't give a timeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Period Period data type Stale Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: PeriodIndex looses tz information without emitting a warning that the timezone is dropped.
5 participants