ENH: Use find_stack_level in pandas.core #44358

rhshadrach · 2021-11-08T22:47:15Z

Ensure all linting tests pass, see here for how to run them

This handles most of the stacklevels in core. A few more will take a bit more effort, separating out into a separate PR. After all stacklevels are replaced, will be removing check_stacklevel=False from the tests.

jbrockmendel

LGTM pending green(ish)

jreback · 2021-11-11T00:40:34Z

hmm can you rebase again. nothing should be failing except the 3.10 sdist test (or there is an issue)

…acklevel

jreback · 2021-11-12T03:19:09Z

very nice @rhshadrach

TomAugspurger · 2021-12-02T02:54:47Z

FYI, this had a ~10-20% slowdown on a decent number of benchmarks (e.g. https://pandas.pydata.org/speed/pandas/#groupby.Categories.time_groupby_nosort?python=3.8&Cython=0.29.24&commits=01b86edb). Full list at https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc.

Is that a worthwhile tradeoff?

Also https://pandas.pydata.org/speed/pandas/#index_cached_properties.IndexCache.time_is_all_dates?python=3.8&Cython=0.29.24&p-index_type='RangeIndex'&commits=01b86edb is a lot larger, but I don't know if that's realted.

rhshadrach · 2021-12-02T02:58:22Z

Most likely not - I'll do some benchmarking and see where this is coming from. If it's really widespread, might need to revert. I should be able to check this out in the next few days.

rhshadrach · 2021-12-04T06:59:37Z

@TomAugspurger - Is there a way to get the table from the full list as tabular data?

rhshadrach · 2021-12-04T10:24:14Z

After digging into this a bit, it does seem worth it to incur the slow down in at least a lot of these benchmarks. It takes about 3ms on my machine to call find_stack_trace. In a benchmark that runs in 10ms like the first example posted above, this is regarded as significant. However, in this particular instance, find_stack_level is called only once. I think that 3ms overhead is worth it to get a better error message to the user and to lessen maintenance (moving code around, calling from multiple places, etc).

@TomAugspurger @jreback - Is there a consensus that if find_stack_level is called O(1) times (with a small constant), then this is worth the overhead?

What would be concerning is if there is code where a warning is generated multiple times, making the overhead significantly greater. Indeed this is happening in at least two cases, and it should definitely be reverted. To track this, I've run the test suite while outputting (to a file) any test that hit find_stack_trace along with the number of times it was called. If the test is parameterized, I'm also storing the run id from pytest. The value_counts for calling find_stack_level from a single line of code in a test run more than once are:

calls
2        1378
3         320
4          70
5           2
6           6
7           2
9           2
10          8
12          5
14          2
24          1
28          1
38          1
50          1
56          3
102         1
103         1
300         2
dtype: int64

There can be lots of false positives here, e.g. a for loop in the test. Digging into the top offenders, below are the tests and on the right are comments from investigating the instance:

                                                                  calls
/tests/indexes/multi/test_indexing.py:879:none                      300  # Remove find_stack_level
/tests/indexes/multi/test_indexing.py:883:none                      300  # Remove find_stack_level
/tests/io/parser/conftest.py:29:c_low                               103  # Called many times
/tests/io/parser/conftest.py:29:c_high                              102  # Called many times
/tests/frame/test_constructors.py:1156:none                          56  # Called many times
/tests/frame/test_constructors.py:1163:none                          56  # Called many times
/tests/frame/test_constructors.py:1170:none                          56  # Called many times
/tests/indexes/multi/test_indexing.py:890:none                       50  # Remove find_stack_level
/tests/plotting/common.py:658:none                                   38  # Called many times
/tests/indexes/datetimes/test_partial_slicing.py:301:none            28  # Called many times
/tests/groupby/test_function.py:975:none                             24  # Remove find_stack_level
/tests/indexes/multi/test_duplicates.py:213:none                     14  # Remove find_stack_level
/tests/io/pytables/test_put.py:314:none                              14  # Removed find_stack_level
/tests/indexes/interval/test_constructors.py:91:breaks4-int64        12  # Called many times
/tests/indexes/interval/test_constructors.py:91:breaks5-int64        12  # Called many times
/tests/indexing/common.py:161:none                                   12  # Called many times
/tests/frame/test_reductions.py:161:none                             12  # Called many times
/tests/extension/base/getitem.py:248:integer-array                   12  # Called many times

Called many times means there is e.g. a for loop in the test itself, giving rise to false positives.The two cases I mentioned above that should be reverted are related to MultiIndexing and DataFrame.describe.

TomAugspurger · 2021-12-04T14:04:05Z

@TomAugspurger @jreback - Is there a consensus that if find_stack_level is called O(1) times (with a small constant), then this is worth the overhead?

Yeah I think that's a worthwhile tradeoff.

rhshadrach · 2021-12-04T15:25:42Z

Thanks @TomAugspurger. I've gone through the list of regressions associated with this PR. All regressions are either (a) less than 4ms or (b) call find_stack_level once (a few of these are timed just slightly greater than 4ms) except for:

All three of these no longer call find_stack_level after doing the reverts mentioned in #44358 (comment)

rhshadrach added 2 commits November 8, 2021 17:34

ENH: Use find_stack_level in pandas.core

96d9d29

Reverts

51561df

jbrockmendel approved these changes Nov 11, 2021

View reviewed changes

jreback added the Enhancement label Nov 11, 2021

jreback added this to the 1.4 milestone Nov 11, 2021

Merge branch 'master' of https://github.com/pandas-dev/pandas into st…

84291c9

…acklevel

jreback merged commit 01b86ed into pandas-dev:master Nov 12, 2021

rhshadrach deleted the stacklevel branch November 12, 2021 03:56

nickleus27 pushed a commit to nickleus27/pandas that referenced this pull request Nov 28, 2021

ENH: Use find_stack_level in pandas.core (pandas-dev#44358)

a0a76a1

rhshadrach mentioned this pull request Dec 4, 2021

REGR: find_stack_level reverts #44753

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Use find_stack_level in pandas.core #44358

ENH: Use find_stack_level in pandas.core #44358

rhshadrach commented Nov 8, 2021

jbrockmendel left a comment

jreback commented Nov 11, 2021 •

edited

Loading

jreback commented Nov 12, 2021

TomAugspurger commented Dec 2, 2021

rhshadrach commented Dec 2, 2021 •

edited

Loading

rhshadrach commented Dec 4, 2021

rhshadrach commented Dec 4, 2021 •

edited

Loading

TomAugspurger commented Dec 4, 2021

rhshadrach commented Dec 4, 2021

ENH: Use find_stack_level in pandas.core #44358

ENH: Use find_stack_level in pandas.core #44358

Conversation

rhshadrach commented Nov 8, 2021

jbrockmendel left a comment

Choose a reason for hiding this comment

jreback commented Nov 11, 2021 • edited Loading

jreback commented Nov 12, 2021

TomAugspurger commented Dec 2, 2021

rhshadrach commented Dec 2, 2021 • edited Loading

rhshadrach commented Dec 4, 2021

rhshadrach commented Dec 4, 2021 • edited Loading

TomAugspurger commented Dec 4, 2021

rhshadrach commented Dec 4, 2021

jreback commented Nov 11, 2021 •

edited

Loading

rhshadrach commented Dec 2, 2021 •

edited

Loading

rhshadrach commented Dec 4, 2021 •

edited

Loading