Skip to content

BUG: args offset in Timestamp.__new__ causes bugs #52344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AlexKirko
Copy link
Member

@AlexKirko AlexKirko commented Apr 1, 2023

This is the promised refactor for the Timestamp constructor. Detailed comments are below.

ASV benchmark results for Timestamp construction:

       before           after         ratio
     [1c2ad16b]       [7378f54f]
     <main>           <gh-52343-timestamp-from-positional>
+        1.20±0μs      1.86±0.02μs     1.56  tslibs.timestamp.TimestampConstruction.time_from_positional
-     1.05±0.01μs         953±10ns     0.91  tslibs.timestamp.TimestampConstruction.time_parse_iso8601_no_tz
-         253±1ns          209±4ns     0.82  tslibs.timestamp.TimestampConstruction.time_from_pd_timestamp

Replaces #52221

@AlexKirko
Copy link
Member Author

WIP. The same tests as in #52221, different solution.

@@ -48,6 +48,9 @@ def time_from_datetime_aware(self):
def time_from_pd_timestamp(self):
Timestamp(self.ts)

def time_from_positional(self):
Timestamp(2020, 1, 1, 0, 0, 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional benchmark for future regressions might be helpful.

@AlexKirko AlexKirko marked this pull request as ready for review April 5, 2023 07:56
@AlexKirko
Copy link
Member Author

AlexKirko commented Apr 5, 2023

@mroeschke @jbrockmendel @jreback
Hello, everyone. Here is the Timestamp refactor needed to switch to using *args, **kwargs for Timestamp construction. This PR fixes both the fold and the positional vs keyword inconsistency bugs. There is some performance gain when building from a Timestamp. There is a performance loss when building from positional arguments (added a benchmark for this case to the asv benchmark suite).

Had to rewrite most of the constructor, because naively switching to *args, **kwargs without changing the rest of the code had resulted in a 50-60% performance hit for most of the use cases.

Please review. Any suggestions on improving the performance of the positional arguments case would be appreciated.

@jreback
Copy link
Contributor

jreback commented Apr 8, 2023

i know i suggested the args/kwargs pattern, but... is there an alternative way out of the current problem by being stricter in disallowing mixed positional/keyword?

+1 in this (or allow only fully qualified positionals - eg ymdhms required plus kwarg for example)

@AlexKirko
Copy link
Member Author

AlexKirko commented Apr 9, 2023

@jbrockmendel @jreback The way to disallow passing args as kwargs would be to use the Python>3.8 args-only syntax to make everything up to tzinfo positional-only. A couple of issues I see:

  • this breaks the behavior compatibility with datetime. Users will construct the way they do with datetime and get errors. Will need to make those clear enough that a user understands this is expected behavior.
  • this breaks old code. I personally built Timestamps like Timestamp(2020, 1, 1, hour=20, minute=30, second=0) in my applications, adding kwargs only where I needed to make it clear what the Timestamp would contain. This is a pretty standard practice in finance where I come from.
  • going through a deprecation cycle would keep the bug unfixed (unless we temporarily go with decoration or the current approach in the short-term and add deprecation warnings, and only then switch to positional-only)
  • The arg offset bug is present in version 1.5.x too. A fix that breaks old code would not be possible/available there if we wanted to apply it.
  • tzinfo would become positional-only, I assume. tz will remain kwarg-only. This would look weird to users and probably be reported as an issue.

Because of these points, I didn't think this option was on the table.

Let's come up with an approach that you will be able to accept if it's decently done, and I will implement it. But I don't want to redo this from scratch a fourth time (I'll be happy to go back to one of the previous options though if we don't like the third solution).

I believe that the current arg offset in Timestamp is a buggy PITA that we took advantage of to maintain performance and keep fragile code readable. The fix will either be a kludge, be harder to maintain, or break a bunch of code. I don't see a solution that does neither of those but would be happy to be wrong.

@AlexKirko
Copy link
Member Author

Unfortunately, I don't think there is a way of fixing this outside of using *args, **kwargs or enforcing the positional/kwargs division.

@AlexKirko
Copy link
Member Author

I'm back from a work trip. I will test out the args-only approach and report on the performance benchmarks here.

Any thoughts on any of the points I raised above?

@jbrockmendel
Copy link
Member

@AlexKirko you make goods points, thanks for thinking through the idea that i clearly didn't. I'll give this a close look. Thanks for being patient.

unit=None,
fold=None,
):
def __new__(cls, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this get rendered in the docs somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexKirko any idea about this?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 13, 2023
@AlexKirko
Copy link
Member Author

A bit sick at the moment, will come back to it in a couple days once I'm better.

@AlexKirko AlexKirko requested a review from MarcoGorelli as a code owner June 25, 2023 13:33
@AlexKirko
Copy link
Member Author

Back among the living. The illness and the job change have cost me some time.

@@ -368,6 +368,8 @@ Datetimelike
- Bug in :meth:`Timestamp.round` with values close to the implementation bounds returning incorrect results instead of raising ``OutOfBoundsDatetime`` (:issue:`51494`)
- Bug in :meth:`arrays.DatetimeArray.map` and :meth:`DatetimeIndex.map`, where the supplied callable operated array-wise instead of element-wise (:issue:`51977`)
- Bug in constructing a :class:`Series` or :class:`DataFrame` from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (:issue:`52212`)
- Bug in :class:`Timestamp` raising an error when passing fold when constructing from positional arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put "fold" in quotes/ticks

@jbrockmendel
Copy link
Member

Is there any viable path to splitting this into smaller pieces? This seems like a very difficult problem and is easy for me and other reviewers to be lazy about reviewing

@AlexKirko
Copy link
Member Author

Is there any viable path to splitting this into smaller pieces? This seems like a very difficult problem and is easy for me and other reviewers to be lazy about reviewing

The bigger problem, to be honest, is me having been either swamped with work or sick this past month.

In my opinion, splitting this into multiple PRs that make sense is possible, only if we stop caring about the performance of the initial implementation, and put the necessary performance improvements into separate PRs. It would make this easier to review but lead to an initial PR that is very slow in some use cases.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Aug 11, 2023
@AlexKirko
Copy link
Member Author

AlexKirko commented Aug 11, 2023

I have merged in main, and I will try another approach next week, so I would like to reopen the request for now.

The problem here is that it's a "I will hate it when I see it" situation. There are three solutions that come to mind, and I've tried and submitted for review two. The third is terrible, in my opinion.

  1. decorate the constructor. Very slow, very easy to read and support. Deemed unacceptable because of performance regressions.
  2. fix the kwargs/args order inside the constructor. Minimal performance impact, but is ugly and makes the code harder to support. This is the current approach in the branch that everyone seems to also reject.
  3. Use the positional-only syntax. Breaks compatibility with datetime, breaks user code. I think it's much worse than the previous two approaches, but I can try it next week for illustrative purposes if nothing else.

No one has suggested another solution, and I'm out of ideas. I personally think that option 2 (the current one) is fine. Yes, it's ugly, but it doesn't slow things down too much or break old code and datetime compatibilty.

I'm well aware that all the approaches have significant downsides, but either someone needs to suggest an idea I haven't considered, or we need to choose. I'll try out the positional-only solution next week for illustration purposes, but I think it's the worst. Breaking API compatibility with current code to fix a bug is just wrong.

We can also leave the bug in if fixing it is worse than the current state of things, but that's also not great, to put it in mildly.

If I had the authority to take the responsibility for this decision, I'd just pick the current solution and move on.

I'd be happy to hear your thoughts, @mroeschke . You have more experience, so maybe you see a way out here.

@AlexKirko AlexKirko reopened this Aug 11, 2023
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option 2 (current approach) looks like a good soln

the code complexity is not too bad - commented pretty well already

@AlexKirko
Copy link
Member Author

Okay, then I'll focus on cleaning up the current solution instead. Will try to make the changes as painless as possible to support.

@AlexKirko
Copy link
Member Author

I apologize for being glacially slow on this. Finally had an opportunity to review.

The MR is in decent shape except for the docs issue. We change the signature to *args, **kwargs and this gets reflected in the auto-generated docs. I suppose there is nothing to do here except implement an ad-hoc override for the docs.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: args offset in Timestamp.__new__ causes bugs
4 participants