Genextreme #84

ccaprani · 2022-09-26T14:37:05Z

This is a follow-on from #4 which after rebasing closed the PR. It would be good to merge and close this out as it has been a long time. @ricardoV94 please advise if anything outstanding.

ricardoV94

Thanks for coming back to this. Looks great.

The tests in the main repo have been recently refactored to follow the same path as the code. I suggest you update it so it's already pymc compatible. Otherwise I only left a small suggestion for a direct test of the scipy kwarg and a small logp/logcdf tweak.

ricardoV94 · 2022-09-26T15:50:42Z

pymc_experimental/distributions/continuous.py

+        # bnd = mu - sigma/xi
+        return check_parameters(
+            logp_expression,
+            1 + xi * (value - mu) / sigma > 0,


This constraint depends on mu so it should probably be in a switch(constraint, logp, -inf). Usually value-dependent constraints are treated elemwise while parameter based constraints are treated blockwise with check_parameters (if one fails, everything fails), and we raise an error outside of sampling.

Suggested change

1 + xi * (value - mu) / sigma > 0,

1 + xi * (value - mu) / sigma > 0,

I've implemented this now, but not precisely sure the purpose: check_parameters is value-based, while switch(constraint... is tensor-based, right? Anyway, please have a look.

The point is that the logp is usually defined for every possible value (although it can be zero for some combinations of value parameters, such as a halfnormal evaluated at -1), so it should never "error" out. Logps can however be undefined for some parameter values, such as a halfnormal with negative sigma in which case we error out. We do that distinction in PyMC by using a switch for 0 probability values and check_parameters for parameter constraints

import pymc as pm print(pm.logp(pm.HalfNormal.dist(), -1.0).eval()) # -np.inf pm.logp(pm.HalfNormal.dist(sigma=-5), 1).eval() # ValueError: sigma must be positive

Scipy is similar, but yields nan

import scipy.stats as st print(st.halfnorm().logpdf(-1.0)) # -inf print(st.halfnorm(scale=-5).logpdf(1.0)) # nan

Thanks @ricardoV94 . I can see this alright. Nevertheless, the tests are failing only in the pymc test environment. Here are two cases:

Allowing xi to be in the R domain (mathematically true, but not in practice). The npt.assert_almost_equal is an absolute test of equality defaulting to 6 decimal places, and so we end up with silly outcomes like below, where the relative error is at the 61 - 47 = 15th order of magnitude down! This surely should be accepted?

AssertionError: Arrays are not almost equal to 6 decimals {'mu': array(1.), 'sigma': array(0.01), 'xi': array(-0.01), 'value': array(-2.1)} Mismatched elements: 1 / 1 (100%) Max absolute difference: 2.39777612e+47 Max relative difference: 1.26305703e-14 x: array(-1.898391e+61) y: array(-1.898391e+61)

You can see the check from numpy here. It's worth noting that the numpy docs don't recommend using assert_almost_equal . For example, assert_allclose would allow a relative tolerance to be specified, rather than an absolute one (in fact both or either).

Restricting the domain of xi to the practical range of [-1,+1]. In this case running check_logp gives:

File ~/anaconda3/envs/pymc4-dev/lib/python3.10/site-packages/_pytest/outcomes.py: 196, in fail(reason, pytrace, msg) 194 __tracebackhide__ = True 195 reason = _resolve_msg_to_reason("fail", reason, msg) --> 196 raise Failed(msg=reason, pytrace=pytrace) Failed: test_params={'mu': array(-2.1), 'sigma': array(0.01), 'xi': TensorConstan t{-2.0}}, valid_value=-2.1

where it is checking for an invalid value for xi (-2). But note that it passes when called directly:

In [15]: μ,σ,ξ,x = -2.1,0.01,-2.0,-2.1 # fails check_parameters with xi = R Doma ...: in ...: logp = pm.logp(GenExtreme.dist(mu=μ,sigma=σ,xi=ξ), x).eval() ...: print(logp) 3.605170185988091

and calling check_parameters passes

In [16]: mu = at.as_tensor_variable(floatX(μ)) ...: sigma = at.as_tensor_variable(floatX(σ)) ...: xi = at.as_tensor_variable(floatX(ξ)) In [17]: sigma > 0 Out[17]: Elemwise{gt,no_inplace}.0 In [18]: 1 + xi * (x-mu)/sigma > 0, Out[18]: (Elemwise{gt,no_inplace}.0,) In [19]: check_parameters(logp, ...: sigma > 0, ...: 1 + xi * (x-mu)/sigma > 0, ...: msg="sigma <= 0 or 1+xi*(x-mu)/sigma <= 0") Out[19]: Check{sigma <= 0 or 1+xi*(x-mu)/sigma <= 0}.0

I did have a look at adding a -1 < xi < 1 check in check-parameters but I couldn't get it tow work with either

xi > -1 & xi < 1 as a single check due to the tensors (is there an elementwise and? I couldn't find it in aesara docs), or

two checks, ...,x > -1, x<1,...

It would be great to have your input on this, since everything seems to be working fine until it enters the pymc test environment. Hopefully it is just a tweak to one of the conditions in the logp/logc functions perhaps?

You can increase the test tolerange in check_logp/ check_locdf, with the kwarg decimal. Do note that you may only be checking a subset of parameter combinations when testing locally (the default is a random subset of up to 100 combinations), so you can temporarily set n_samples=-1 to make sure all parameter combinations are tested

@ricardoV94 good news: Using at.and_ in check_parameters did the trick for the restricted (but practical) domain of $\xi \in [-1,1]$. All checks now pass!

I can expand to the full domain of $\xi \in \mathbb{R}$ if the logp check is made relative instead of absolute as mentioned above.

In any case, thank you so much for your help on this. I think it's been 11 months in the pipeline, so hopefully we can get it merged?!

A relative check like the one I mentioned would compare the ratios of the two numbers to a relative accuracy of 1e-6 instead, which makes more sense.

I have thought about this sometime ago, but then faced issues. logps vary widely (and non-linearly) between 1e-700xs and 1e+x. More annoying, it's not uncommon to have logps around 0, in which cases relative comparisons or, on the other hand, more loose absolute comparisons become pretty meaningless.

If you have a more flexible solution that works in general that would be great. Certainly there are more principled ways of doing these accuracy checks.

@ricardoV94 npt.assert_allcose here allows for both a relative, rtol and absolute atol tolerance to be set. Example:

>>> x = np.pi*np.array([1e2,1e-1,1e-6,1e-10,1e-100,1e-700]) >>> y = (1 + 1e-6)*x >>> npt.assert_allclose(x,y) # will assert AssertionError: Not equal to tolerance rtol=1e-07, atol=0 Mismatched elements: 5 / 6 (83.3%) Max absolute difference: 0.00031416 Max relative difference: 9.99999e-07 x: array([3.141593e+002, 3.141593e-001, 3.141593e-006, 3.141593e-010, 3.141593e-100, 0.000000e+000]) y: array([3.141596e+002, 3.141596e-001, 3.141596e-006, 3.141596e-010, 3.141596e-100, 0.000000e+000])

but this does not assert:

>>> npt.assert_allclose(x,y,rtol=1e-6,atol=1e-10)

Do want to try to open a PR on the main repo to change the logic then?

Done: pymc-devs/pymc#6159

pymc_experimental/distributions/continuous.py

pymc_experimental/tests/test_distributions.py

pymc_experimental/tests/test_distributions_moments.py

pymc_experimental/tests/test_distributions_random.py

pymc_experimental/distributions/continuous.py

ricardoV94

Two more small things

pymc_experimental/distributions/continuous.py

pymc_experimental/distributions/__init__.py

pymc_experimental/distributions/continuous.py

pymc_experimental/tests/distributions/test_continuous.py

pymc_experimental/distributions/continuous.py

ccaprani · 2022-09-28T02:16:44Z

@ricardoV94 it fails the tests for float32 due to differences between scipy and pymc underflows. Shall I add:

    @pytest.mark.skipif(
        condition=(aesara.config.floatX == "float32"),
        reason="PyMC underflows earlier than scipy on float32",
    )

to the tests for logp and logc as per other distributions?

ricardoV94 · 2022-09-28T05:23:01Z

@ricardoV94 it fails the tests for float32 due to differences between scipy and pymc underflows. Shall I add:
    @pytest.mark.skipif(
        condition=(aesara.config.floatX == "float32"),
        reason="PyMC underflows earlier than scipy on float32",
    )
to the tests for logp and logc as per other distributions?

Sure

ccaprani · 2022-09-28T08:33:40Z

The test on the windows machine is failing with message:

E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           {'mu': array(0.01), 'sigma': array(0.01), 'xi': array(0.), 'value': array(-1.)}
E           Mismatched elements: 1 / 1 (100%)
E           Max absolute difference: 9.90352031e+27
E           Max relative difference: 1.35533585e-16
E            x: array(-7.30706e+43)
E            y: array(-7.30706e+43)

You can see why I reckon relative tolerance would be good! pymc-devs/pymc#6159 Can this be accepted without this test passing as is?

ricardoV94 · 2022-09-28T08:55:51Z

The test on the windows machine is failing with message:
E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           {'mu': array(0.01), 'sigma': array(0.01), 'xi': array(0.), 'value': array(-1.)}
E           Mismatched elements: 1 / 1 (100%)
E           Max absolute difference: 9.90352031e+27
E           Max relative difference: 1.35533585e-16
E            x: array(-7.30706e+43)
E            y: array(-7.30706e+43)
You can see why I reckon relative tolerance would be good! pymc-devs/pymc#6159 Can this be accepted without this test passing as is?

Let's just extend the check_logp/logcdf to allow to use rtol already. Your PR on PyMC does not allow it yet right?

ricardoV94 · 2022-09-28T08:58:35Z

Those are also very extreme values (pun not intended), tweaking the grid of mu, sigma would certainly avoid it, so if you want to wrap this up quicker that's another option.

ccaprani · 2022-09-28T13:41:19Z

@ricardoV94 that's a fantastic idea - thank you. I've just implemented that now.

ricardoV94 · 2022-09-28T15:17:09Z

Looks good, I pushed a commit that separates the logp and logcdf tests and adds an xfail only for float32. I tested with n_samples=-1 so it should be stable from now on

ccaprani · 2022-09-29T00:06:01Z

Wonderful to get this finally done @ricardoV94 ; the first PR on this was a year ago! The corresponding example was also just merged, so many thanks for all your help indeed!

ricardoV94 · 2022-09-29T05:07:37Z

Hopefully the next one will be smoother. Thanks for sticking around

ccaprani · 2022-09-29T05:31:55Z

No worries - a big learning curve with long gaps between efforts! Thanks for your patience!

ccaprani added 4 commits September 27, 2022 00:03

Initial commit

f1a67b9

Updated structure; code operational

959677f

Change scipy.ststats.distribution import

929bdfa

Added moment test

8cfc35d

ccaprani mentioned this pull request Sep 26, 2022

Adding Generalized Extreme Value Distribution #4

Closed

ricardoV94 requested changes Sep 26, 2022

View reviewed changes

Update per pymc-devs#84 review; tests, and dist code

85c3782

ricardoV94 requested changes Sep 27, 2022

View reviewed changes

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

junpenglao reviewed Sep 27, 2022

View reviewed changes

pymc_experimental/distributions/__init__.py Outdated Show resolved Hide resolved

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

pymc_experimental/tests/distributions/test_continuous.py Outdated Show resolved Hide resolved

ccaprani added 3 commits September 27, 2022 19:11

Updated tests and class per comments

c43423f

Tweaks post-review

13f9ade

Test tweaks and all pass

b181815

ricardoV94 requested changes Sep 27, 2022

View reviewed changes

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

Resolve unnecessary check

14c8252

ricardoV94 requested changes Sep 27, 2022

View reviewed changes

pymc_experimental/distributions/continuous.py Outdated Show resolved Hide resolved

ccaprani added 2 commits September 28, 2022 11:38

Reverse constraints msg

8471fca

Clearing precommit requirements

e90be67

ccaprani mentioned this pull request Sep 28, 2022

Standardize test messages to use the permissible constraints (i.e. the thing being asserted) pymc-devs/pymc#6158

Closed

Skip tests on float32 due to underflow of pymc vs scipy

306ce90

ccaprani mentioned this pull request Sep 28, 2022

Generalization of logp test comparison to facilitate relative and/or absolute comparison pymc-devs/pymc#6159

Open

Improve test grid for parameters

2301015

ccaprani and others added 2 commits September 29, 2022 00:05

Revert mu test domain to R

0022a2f

Separate logp and logcdf tests

544cf42

ricardoV94 approved these changes Sep 28, 2022

View reviewed changes

ricardoV94 merged commit 78b7f2f into pymc-devs:main Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genextreme #84

Genextreme #84

ccaprani commented Sep 26, 2022 •

edited

Loading

ricardoV94 left a comment •

edited

Loading

ricardoV94 Sep 26, 2022

ccaprani Sep 27, 2022

ricardoV94 Sep 27, 2022 •

edited

Loading

ccaprani Sep 27, 2022 •

edited

Loading

ricardoV94 Sep 27, 2022 •

edited

Loading

ccaprani Sep 27, 2022

ricardoV94 Sep 27, 2022 •

edited

Loading

ccaprani Sep 28, 2022

ricardoV94 Sep 28, 2022

ccaprani Sep 28, 2022

ricardoV94 left a comment

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022 •

edited

Loading

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ccaprani commented Sep 29, 2022

ricardoV94 commented Sep 29, 2022

ccaprani commented Sep 29, 2022

	1 + xi * (value - mu) / sigma > 0,
	1 + xi * (value - mu) / sigma > 0,

Genextreme #84

Genextreme #84

Conversation

ccaprani commented Sep 26, 2022 • edited Loading

ricardoV94 left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 Sep 26, 2022

Choose a reason for hiding this comment

ccaprani Sep 27, 2022

Choose a reason for hiding this comment

ricardoV94 Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

ccaprani Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

ccaprani Sep 27, 2022

Choose a reason for hiding this comment

ricardoV94 Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

ccaprani Sep 28, 2022

Choose a reason for hiding this comment

ricardoV94 Sep 28, 2022

Choose a reason for hiding this comment

ccaprani Sep 28, 2022

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022 • edited Loading

ccaprani commented Sep 28, 2022

ricardoV94 commented Sep 28, 2022

ccaprani commented Sep 29, 2022

ricardoV94 commented Sep 29, 2022

ccaprani commented Sep 29, 2022

ccaprani commented Sep 26, 2022 •

edited

Loading

ricardoV94 left a comment •

edited

Loading

ricardoV94 Sep 27, 2022 •

edited

Loading

ccaprani Sep 27, 2022 •

edited

Loading

ricardoV94 Sep 27, 2022 •

edited

Loading

ricardoV94 Sep 27, 2022 •

edited

Loading

ricardoV94 commented Sep 28, 2022 •

edited

Loading