Skip to content

Add/reorganize scalar Timedelta tests #46936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from

Conversation

patrickmckenna
Copy link

@patrickmckenna patrickmckenna commented May 4, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This is a companion of sorts to #46854. It adds more thorough tests, and groups them by type, but doesn't change any behavior.

/cc @jbrockmendel @jreback

@patrickmckenna patrickmckenna marked this pull request as ready for review May 4, 2022 01:09
@patrickmckenna patrickmckenna marked this pull request as draft May 4, 2022 17:35
@pep8speaks
Copy link

pep8speaks commented May 4, 2022

Hello @patrickmckenna! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-17 01:55:27 UTC

@jreback jreback added Testing pandas testing functions or related to the test suite Timedelta Timedelta data type labels May 6, 2022
@jbrockmendel
Copy link
Member

it looks like this PR has evolved to involve a lot of stuff. i'd suggest breaking it into tightly-scoped pieces for easier (faster) review

@patrickmckenna
Copy link
Author

it looks like this PR has evolved to involve a lot of stuff. i'd suggest breaking it into tightly-scoped pieces for easier (faster) review

Yes, it's comically large at the moment, apologies. Fully understand and agree it needs to be smaller to make review not a PITA. (I'd been keeping it open, as a draft, to simplify running the full test suite as I iterated on a few ideas.)

I'll push a few more changes and add some high-level comments shortly, to figure out how best to split this up.

Copy link
Author

@patrickmckenna patrickmckenna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overwhelming majority of line changes come from edits to 2 test files:

  • pandas/tests/scalar/timedelta/test_arithmetic.py
  • pandas/tests/scalar/timedelta/test_timedelta.py

I've tried to keep all existing tests, especially those that map to specific PRs, albeit sometimes simplified, or updated to test a broader range of inputs. I also removed pandas/tests/scalar/timedelta/test_{constructors,formats.py}, as they contained a lot of duplicate code, and moved the unique stuff to the other 2 files mentioned above.

The other component of this PR is a few updates to Timedelta, simplifying/deduping the constructor logic and performing overflow checks. I can extract those bits into a standalone PR, though that would still leave a ton of changed lines to review.

@@ -892,6 +882,42 @@ cdef _timedelta_from_value_and_reso(int64_t value, NPY_DATETIMEUNIT reso):
return td_base


@cython.overflowcheck(True)
cdef object create_timedelta(object value, str in_unit, NPY_DATETIMEUNIT out_reso):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the main functional changes introduced by this PR as it stands. The motivations:

  • DRY up Timedelta creation logic, which was divided and duplicated over several functions
  • Do so outside Timdelta.__new__, since (AFAICT) there shouldn't yet be a public API for creating instances with non-nano resolution
  • Add overflow checks, and consistently raise OutOfBoundsTimedelta when its detected.

I can move these changes to a separate PR, to keep this one focused on tests alone, if that's desired.

if value is _no_input:
if not len(kwargs):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplifications here are of a piece with the introduction of create_timedelta above. Timedelta.__new__ had duplicated much of the logic in convert_to_timedelta64, so that's removed. It's now mostly a wrapper around create_timedelta, which wraps convert_to_timedelta64 and _timedelta_from_value_and_reso.

+ int(kwargs.get('microseconds', 0) * 1_000)
+ int(kwargs.get('milliseconds', 0) * 1_000_000)
+ seconds
kwargs["weeks"] * 7 * 24 * 3600 * 1_000_000_000,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be extracted into a separate function, but that's for later.

@@ -1,34 +1,159 @@
"""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main changes, both here and in test_timedelta.py:

  • Use fixtures to extract, simplify repeated test setup
  • Split up tests with many asserts, and/or distinct sections of setup code, into separate tests.
  • Parameterize tests, wherever sensible, over all arithmetic ops of a similar kind, e.g. add and radd
  • Parameterize tests to run with Array/Index/Series/DataFrames.
  • Consistent grouping of like tests into classes
  • More descriptive test names

@@ -1,394 +0,0 @@
from datetime import timedelta
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file largely duplicated a subset of the tests in test_timedelta.py.

@@ -1,5 +1,19 @@
""" test the scalar Timedelta """
"""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are always at risk of becoming outdated or misleading, but hopefully a few brief notes here are helpful.

Timedelta,
TimedeltaIndex,
offsets,
to_timedelta,
)
import pandas._testing as tm

TD_UNITS = (
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some, but not all, of these seem to be defined in other modules, though they're not always exposed for testing, e.g.

cdef dict timedelta_abbrevs = {

@patrickmckenna
Copy link
Author

It's a tale of two cities on the perf front. A local run just now:

       before           after         ratio
     [c22bc653]       [99f927ee]
     <td64-tests^2>       <td64-tests>
+      2.97±0.2μs      4.66±0.08μs     1.57  tslibs.timedelta.TimedeltaConstructor.time_from_missing
+     7.85±0.05μs         11.5±2μs     1.47  tslibs.timedelta.TimedeltaConstructor.time_from_string
+     3.59±0.06μs      5.18±0.06μs     1.44  tslibs.timedelta.TimedeltaConstructor.time_from_np_timedelta
-        8.78±1μs      7.17±0.09μs     0.82  tslibs.timedelta.TimedeltaConstructor.time_from_unit
-      8.54±0.8μs      5.66±0.03μs     0.66  tslibs.timedelta.TimedeltaConstructor.time_from_components
-      6.87±0.5μs      1.57±0.02μs     0.23  tslibs.timedelta.TimedeltaConstructor.time_from_int

Looking into the regressions.

@patrickmckenna
Copy link
Author

patrickmckenna commented May 11, 2022

Looking into the regressions.

Better now:

> asv continuous -f 1.1 upstream/main HEAD -b tslibs.timedelta.TimedeltaConstructor
...
       before           after         ratio
     [a853022e]       [d0967da6]
                      <tmp/td64-perf>
-        15.4±2μs       11.2±0.7μs     0.73  tslibs.timedelta.TimedeltaConstructor.time_from_iso_format
-      8.72±0.8μs       5.75±0.2μs     0.66  tslibs.timedelta.TimedeltaConstructor.time_from_components
-      8.20±0.4μs       5.06±0.3μs     0.62  tslibs.timedelta.TimedeltaConstructor.time_from_string
-      6.52±0.1μs       1.60±0.2μs     0.25  tslibs.timedelta.TimedeltaConstructor.time_from_int
-      3.24±0.3μs          744±6ns     0.23  tslibs.timedelta.TimedeltaConstructor.time_from_missing

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

patrickmckenna added a commit to patrickmckenna/pandas that referenced this pull request May 11, 2022
@patrickmckenna
Copy link
Author

it looks like this PR has evolved to involve a lot of stuff. i'd suggest breaking it into tightly-scoped pieces for easier (faster) review

Yes, it's comically large at the moment, apologies. Fully understand and agree it needs to be smaller to make review not a PITA. (I'd been keeping it open, as a draft, to simplify running the full test suite as I iterated on a few ideas.)

I'll push a few more changes and add some high-level comments shortly, to figure out how best to split this up.

I've extracted the functional updates, which touch many fewer lines, into #47004, so this PR can focus on de-duping/expanding test coverage.

@patrickmckenna
Copy link
Author

Once #46854 and #47004 merge, I can drastically cut the scope of this PR, and focus it on just consolidating/deduping scalar Timedelta tests. In particular, my thought is to move everything from timedelta/test_{constructors,formats}.py to timedelta/test_timedelta.py, to avoid having related yet incomplete test cases spread across three modules.

If that's not useful, or isn't worth prioritizing, please LMK.

@jbrockmendel
Copy link
Member

If that's not useful, or isn't worth prioritizing, please LMK.

honestly im having trouble motivating myself to carefully review this, but i'll give it another go once the other PRs are done. id advise against spending much time here in the interim

@patrickmckenna
Copy link
Author

Think it might be easier to just close this for now. If there are elements worth reviewing and merging, they can always be cherry-picked into a much more narrowly-scoped PR. Or this can be reopened.

(As mentioned, I left this open in its current form only to simplify checking CI results especially the Windows and 32-bit runs. But it's probably too big now to easily whittle down into something reasonable and useful.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants