API: Introduce optional (and partial) NEP 50 weak scalar logic #21626

seberg · 2022-05-28T20:56:04Z

This is still a work-in-progress, but I also do not mean it to be quite complete (for PR size reasons). There is a reason this is all opt-in for now.

The current state introduces the new NEP 50 "weak scalar" logic to replace value-based casting. There may still be holes in the logic at this point.

The main points are:

Introducing the environment variable NPY_PROMOTION_STATE and functions np._set_promotion_state(state) and np._get_promotion_state() with the following options:
- NPY_PROMOTION_STATE=legacy no change in behavior (default)
- NPY_PROMOTION_STATE=weak new NEP 50 behavior
- NPY_PROMOTION_STATE=weak_and_warn additional gives a warning in many cases where a change occurs (see details below).
Introducing np._no_nep50_warning() as a context manager to locally suppress the optional warning. Unlike NPY_PROMOTION_STATE this is thread- and context-safe.

Missing things are:

It does not yet introduce new warnings/errors for casts. I.e. some integer cases should raise an error, but will not. Some float ones should overflow (different open PR) and other integer cases should error out.
Scalar paths should use correct promotion, but the "long" route, which will make it slow and may lose integer overflow warnings.
np.can_cast does not take into account that technically np.can_cast(1, np.uint8) should probably return True!

Note on warnings:

Warnings will be given for changes in the result-dtype. This is subtle but important, because it means that the fact that np.float32(3.1) < 3.1 changes behavior is ignored by the warnings: The behavior only changes with respect to floating point precision! Similarly, I ignore the warnings that would be triggered inside np.isclose.

Painpoints:

np.arange juggles Python integers internally, which leads to warnings (even if it rarely changes things). Similarly np.linspace runs into warnings easily, although it probably also doesn't change things there (or only very little).
In many cases functions have shady promotions and just need some thoughts to be clear which way they should go. Is it right that np.quantile(float16_arr, 0.5) will return a float16?
Just the mass of changes is a bit tricky to deal with...

This is hopefully test-runnable on other projects though. The missing integer handling should hopefully break loudly on many good test-suites and you actually will get the transition warning for those cases.

seberg · 2022-05-31T20:45:22Z

I think this should actually be usable, now (not final, but usable for first tests). SciPy only shows 244 failures (most of them are clustered in either sparse or KDE tests; the KDE tests go to longdouble when they did not before and then fail because they use linalg which barfs on longdouble.

With warnings enabled, there are ~2400 failures, but even those should be very heavily clustered on parametrized tests. It may be that some of those should just be ignored (even in NumPy itself).

There are only a few serious ones (e.g. due to low-precision integers being used where they probably should not be used), I suspect the serious ones are easy to fix.
None of that is particularly surprising by now.

(EDIT: I am not even sure the 2400 warnings are even a lot of churn, they seem very clustered to a few heavily parametrized tests.)

BvB93

Could you add annotations for the new functions to the main __init__.pyi file?
I do believe that contextmanager still has to be imported from contextlib.

@contextmanager
def no_nep50_warning() -> Generator[None, None, None]: ...
def get_promotion_state() -> str: ...
def set_promotion_state(state: str, /) -> None: ...

seberg · 2022-06-09T23:30:09Z

I think this should be reaching a merge-able state now. I would like to still add brief docs (either in global state, or the NEP itself).

There are a few changes, so to summarize:

In result_type, CanCastArrayTo and ufuncs arrays are now tagged if they used to be int, float, complex.
Both ResultType and ufunc dispatching, will run both old and new code paths (if necessary for warnings), or otherwise basically switch between the two.
The following new API functions exist:
- np._get_promotion_state(), which can be legacy, weak, or weak_and_warn
- np._set_promotion_state()
- with np._no_nep50_warning():
Further, there is a new testing fixture:
- weak_promotion, which is True or False and runs the test with the promotion state set to legacy and weak_and_warn

A bit more "indirect" changes are that I fixed up a few tests to at least include np._no_nep50_warning, but this is not complete. The test-suite cannot be run successfully with the new promotion state set.

The promotion of rational which is odd/broken, is a bit changed. I tried to make the new promotion path more robust (which also affects the "legacy" version in some cases). I think that is correct (could be expanded), in principle maybe even a fix, but until an issue is opened, I wouldn't plan on backporting it.
(There may be weird corner cases currently when a user-dtype has spotty casting rules defined, but I think they existed for multiple versions now.)

Even the new promotion has to use the min-scalar logic to avoid picking up a float16 loop for `np.int8(3) * 3.`.

We need to be able to query the state for testing, probably should be renamed before the end, but need to have something for now.

Promotion in percentile will now more aggressively preserve the input dtype for floating point types (rather than upgrading the type to at least float64).

This ensures that the precision is not downcast, which could make a small value zero (for float16 mostly). This lets tests pass that check whether `np.float16(0)` is almost equal to 0, which otherwise fail (because `float16(0.00000001)` will evaluate to 0 exactly.

Also make the warning message sane :)

The issue is that bools are subclasses of ints, which triggers the more general problem that we cannot use the `int` abstract dtype if the input has a concrete dtype associated (e.g. bools, but also general user defined DTypes in principle)

Forcing the output dtype does not work here, since it actually can be integral (not that this is usually a good idea). In practice, the change we are doing here is forcing 64bit (or 32bit depending on platform) of precision for the calculation. This means that the change will only ever increase precision mildly compared to the current situation.

If all are scalars, then legacy promotion is not forced but we would use weak promotion internally suddenly (which we must not!).

Co-authored-by: Bas van Beek <[email protected]>

This follows the tests (and actually goes hand in hand with them). There are still some apparent issues here though, I suspect (but am not sure), the that the legacy promotion may need to kick in more agressively when errors occur. Also, surprisingly this fixes things that maybe should fail in legacy promotion, and I am not yet sure why...

…re robust It seems that the (weird and probably non-existing in practice) case of uint8 vs. int8 promotion when the input is a single integer was broken at some point and this fixes it again. This is only really relevant for rational, which defines only a very selective number of integer promotions. This fixes up the previous chunk that relaxes promotion fallbacks a lot for legacy dtypes.

this also effectively fixes some corner cases in np.result_type

rgommers · 2022-06-17T13:55:45Z

I'm testing this with SciPy and pre-emptively fixing up some things as needed. For one test failure (scipy/sparse/tests/test_sputils.py::TestSparseUtils::test_get_index_dtype ), I do see some unexpected behavior though:

>>> np.uint32(1) < np.iinfo(np.int32).min
True
>>> type(np.iinfo(np.int32).min)
<class 'int'>
>>> np.uint32(1) < -214783648
True

seberg · 2022-06-17T14:10:50Z

I'm testing this with SciPy and pre-emptively fixing up some things as needed.

Very cool, thanks! That result should really not be the final result there. Right now, my expectation for the final result is that it will raise an error.

I.e. raising errors (and many RuntimeWarning("overflow ..." for scalar integers) are still missing in this first step.

We could try to be smarter (for comparisons), but I have to think about it more. Maybe comparisons could always use the int64 for the the Python integer, but at least in the simple version that might mean:

Casting the uint32 (which may add "unnecessary" casting).
Promotion problems with uint64 and Python integer (now int64). Although for those it might be good to have all comparison loops for uint64 < int64 and int64 < uint64, etc. since the promotion to float64 leads to fairly crazy stuff anyways in those cases.

rgommers · 2022-06-17T14:20:44Z

Right now, my expectation for the final result is that it will raise an error.

Thanks. That makes sense to me I think - and it's certainly better than the current result.

Maybe comparisons could always use the int64 for the the Python integer, but at least in the simple version that might mean:

It sounds logical, but perhaps it'll turn out later that special-casing casting behavior for comparisons was a mistake - hard to predict. So perhaps starting with an exception is better?

rgommers · 2022-06-20T09:35:51Z

One more in the same vein:

>>> np.uint32(10) % 2**32  # expected answer: 10 (or an exception)
<ipython-input-5-e66c79a8614e>:1: RuntimeWarning: divide by zero encountered in remainder
  np.uint32(10) % 2**32
0

After fixing pretty much all of the failures in SciPy that show up with NPY_PROMOTION_STATE=weak I'd say that:

the issues are pretty minor and a cost worth paying to get the improved casting behavior
finding the issues is currently a real pain, and it will be valuable to have those added warnings/errors for where behavior has changed, to more easily pinpoint the source of each problem.

seberg · 2022-06-20T12:45:48Z

fnding the issues is currently a real pain, and it will be valuable to have those added warnings/errors for where behavior has changed, to more easily pinpoint the source of each problem.

The warnings exist as NPY_PROMOTION_STATE=weak_and_warn? The problem is they are noisy. So yes, probably useful to find where the change happened for a failed test, but likely not useful for the whole test run.

rgommers · 2022-06-20T18:59:49Z

The warnings exist as NPY_PROMOTION_STATE=weak_and_warn? The problem is they are noisy. So yes, probably useful to find where the change happened for a failed test, but likely not useful for the whole test run.

Ah yes, thanks. That does help, when combined with exact test selection. Typically I run all tests in a file or submodule, but that's too noisy here. I just tried on test_filter_design.py, which had 1 failure. Running the file triggers 16 warnings (which get auto-upgraded to test errors). Making it specific enough with

python dev.py test -t scipy.signal.tests.test_filter_design::TestIIRFilter

gives a root cause for the failure of interest:

scipy/signal/_filter_design.py:4853: in besselap
    a_last = _falling_factorial(2*N, N) // 2**N
E   UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from object to int64.

charris · 2022-06-23T16:54:44Z

@seberg does the affect the array_api work?

seberg · 2022-06-23T17:15:23Z

This may be their biggest blocker. But its not set up to be e.g. usable with a context manage, so it is important but doesn't affect them right now (as it doesn't add any new API to use the new behavior).
That may well be possible, but probably needs some thoughts to make sure it is fast enough always. My priority right now is to allow initial tests to move NEP 50 forward, not making np.array_api easier to implement.

charris · 2022-06-23T17:30:46Z

Could you add a release note for this so that folks know how to test against it?

seberg · 2022-06-23T19:07:18Z

Good point, I added a brief release note pointing to the NEP. The NEP has the actual note on how to test (seems more discoverable to me, and better to udpate).

charris · 2022-06-26T18:53:07Z

Thanks Sebastian.

github-actions bot added the 30 - API label May 28, 2022

BvB93 reviewed Jun 2, 2022

View reviewed changes

seberg force-pushed the weak-scalars branch 2 times, most recently from 2186a2b to f89aa56 Compare June 9, 2022 19:52

seberg marked this pull request as ready for review June 9, 2022 23:30

seberg mentioned this pull request Jun 14, 2022

BUGs: Tracking issue for type promotion related bugs. #13754

Closed

seberg force-pushed the weak-scalars branch from d599162 to 205949b Compare June 14, 2022 18:52

seberg and others added 21 commits June 15, 2022 11:42

WIP: Implement weak scalar logic

edb369d

WIP: Restore dual behaviour

2a6a393

WIP: Add warning context manager and fix min_scalar for new promotion

baaeb9a

Even the new promotion has to use the min-scalar logic to avoid picking up a float16 loop for `np.int8(3) * 3.`.

MAINT: Allow subclasses of pyscalars and do not warn if out casting

d9cefc8

MAINT: Fortify methods (in-place division) against promotion changes

09d407a

API: Expose get_promotion_state and set_promotion_state

c855cec

We need to be able to query the state for testing, probably should be renamed before the end, but need to have something for now.

TST: Make test compatible with new promotion or mark for no-warnings

ffab4c4

TST: Adapt percentile test to changed promotion

9a5c5e8

Promotion in percentile will now more aggressively preserve the input dtype for floating point types (rather than upgrading the type to at least float64).

TST: More promotion change adaptions and warning filtering

b2731a5

MAINT: Ensure sanity check in __init__ does not trigger NEP 50 warning

af93162

FIXUP set_promotion_state

53bc32d

MAINT: Remove incorrect check from concat and improve warning

0ec8c0f

TST: Ignore promotion warning in linalg test calculating atol

f8fc8d2

TST: Promotion test (and warning filter) fixup

3d711be

API: Fix legacy promotion option and expose as env variable

419bec8

Also make the warning message sane :)

MAINT: Add stricter subclass check

b148b0f

The issue is that bools are subclasses of ints, which triggers the more general problem that we cannot use the `int` abstract dtype if the input has a concrete dtype associated (e.g. bools, but also general user defined DTypes in principle)

TODO: Add a todo note ;)

0cd7cb3

BUG: Need to keep the skipping-if-legacy logic for now at least

1b83283

If all are scalars, then legacy promotion is not forced but we would use weak promotion internally suddenly (which we must not!).

TYP: Add types for new symbols

dc541a8

Co-authored-by: Bas van Beek <[email protected]>

seberg added 8 commits June 15, 2022 11:42

REV: Revert casting part of the old changes

0af4c44

MAINT: Put array tagging into a static inline function

4853f1c

this also effectively fixes some corner cases in np.result_type

REV: Revert some minor style changes that smuggled their way in

9a8dac0

STY: Minor tweaks/rewording

5f8f18c

MAINT: Fixup bad rebase

b96e43f

DOC,TST: Add uint8(100) + 200) example and fixup float overflow test

2c5f407

seberg force-pushed the weak-scalars branch from 205949b to 2c5f407 Compare June 15, 2022 19:02

Micky774 mentioned this pull request Jun 15, 2022

FIX Adopted more direct dtype restraint in preperation for NEP50 scikit-learn/scikit-learn#23644

Draft

rgommers mentioned this pull request Jun 17, 2022

MAINT: future-proof stats.kde for changes in numpy casting rules scipy/scipy#16424

Merged

rgommers mentioned this pull request Jun 17, 2022

MAINT: fix up _sputils.get_index_dtype for NEP 50 casting rules scipy/scipy#16428

Merged

rgommers mentioned this pull request Jun 20, 2022

MAINT: fix issues with Python scalar related casting behavior (NEP 50) scipy/scipy#16442

Merged

DOC: Add a brief release note and a note in the NEP on how to test

7dcfaaf

charris merged commit b65f0b7 into numpy:main Jun 26, 2022

seberg deleted the weak-scalars branch June 26, 2022 21:26

asmeurer mentioned this pull request Sep 27, 2022

ENH: Getting NEP 50 behavior in the array API compat library #22341

Closed

thomasjpfan mentioned this pull request Mar 14, 2023

RFC: Breaking Changes for Version 2 scikit-learn/scikit-learn#25776

Closed

rgommers mentioned this pull request Apr 24, 2023

Type promotion Quansight-Labs/numpy_pytorch_interop#110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Introduce optional (and partial) NEP 50 weak scalar logic #21626

API: Introduce optional (and partial) NEP 50 weak scalar logic #21626

Uh oh!

seberg commented May 28, 2022 •

edited

Loading

Uh oh!

seberg commented May 31, 2022 •

edited

Loading

Uh oh!

BvB93 left a comment

Uh oh!

seberg commented Jun 9, 2022

Uh oh!

rgommers commented Jun 17, 2022

Uh oh!

seberg commented Jun 17, 2022

Uh oh!

rgommers commented Jun 17, 2022

Uh oh!

rgommers commented Jun 20, 2022

Uh oh!

seberg commented Jun 20, 2022

Uh oh!

rgommers commented Jun 20, 2022

Uh oh!

charris commented Jun 23, 2022

Uh oh!

seberg commented Jun 23, 2022

Uh oh!

charris commented Jun 23, 2022

Uh oh!

seberg commented Jun 23, 2022

Uh oh!

charris commented Jun 26, 2022

Uh oh!

Uh oh!

Uh oh!

API: Introduce optional (and partial) NEP 50 weak scalar logic #21626

API: Introduce optional (and partial) NEP 50 weak scalar logic #21626

Uh oh!

Conversation

seberg commented May 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BvB93 left a comment

Choose a reason for hiding this comment

Uh oh!

seberg commented Jun 9, 2022

Uh oh!

rgommers commented Jun 17, 2022

Uh oh!

seberg commented Jun 17, 2022

Uh oh!

rgommers commented Jun 17, 2022

Uh oh!

rgommers commented Jun 20, 2022

Uh oh!

seberg commented Jun 20, 2022

Uh oh!

rgommers commented Jun 20, 2022

Uh oh!

charris commented Jun 23, 2022

Uh oh!

seberg commented Jun 23, 2022

Uh oh!

charris commented Jun 23, 2022

Uh oh!

seberg commented Jun 23, 2022

Uh oh!

charris commented Jun 26, 2022

Uh oh!

Uh oh!

seberg commented May 28, 2022 •

edited

Loading

seberg commented May 31, 2022 •

edited

Loading