Reintroduce Bernoulli logitp parametrization #4620

ricardoV94 · 2021-04-07T08:53:18Z

This is a proposal of how we might reintroduce the logit_p parametrization to the Bernoulli distribution.

I parametrized everything (sampling, logp, logcdf) in terms of logit_p as this should be numerically more stable. This required changing the BernoulliOp from Aesara. I didn't run any benchmarks to see if this would incur a significant loss in speed relative to the p parametrization.

Also, we get a divide by zero RuntimeWarning if creating a Bernoulli with p=1.

I am happy to listen if you think there might be a better approach.

Depending on what your PR does, here are a few things you might want to address in the description:

what are the (breaking) changes that this PR makes?
important background, or details about the implementation
are the changes—especially new features—covered by tests and docstrings?
linting/style checks have been run
consider adding/updating relevant example notebooks
right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

ricardoV94 · 2021-04-07T08:54:59Z

pymc3/distributions/discrete.py

@@ -332,6 +339,19 @@ def logcdf(self, value):
        )


+class BernoulliLogitRV(BernoulliRV):


Should it inherit from BernoulliRV or RandomVariable directly?

Is the name and _print_name change just confusing and not necessary?

ricardoV94 · 2021-04-07T08:56:15Z

pymc3/tests/test_distributions_random.py

    def test_bernoulli(self):
        pymc3_random_discrete(
-            pm.Bernoulli, {"p": Unit}, ref_rand=lambda size, p=None: st.bernoulli.rvs(p, size=size)
+            pm.Bernoulli, {"p": Unit}, ref_rand=lambda size, p: st.bernoulli.rvs(p, size=size)


This will be replaced following #4608

ricardoV94 · 2021-04-07T09:12:30Z

pymc3/distributions/discrete.py

            value <= 1,
-            p >= 0,
-            p <= 1,
+            ~at.isnan(logit_p),


Invalid p gets converted to nan so this is how we can identify it in bound

brandonwillard

In general, we don't want to create new versions of the basic RandomVariables unless we absolutely must; otherwise, putting the parameterization inside the RandomVariable Op removes the possibility of optimization and/or stabilization, and necessarily incurs the cost of transforming—even when it's unnecessary.

The real question is: what's the advantage of doing it this way, instead of applying the parameterization as an input to the existing RandomVariable?

ricardoV94 · 2021-04-08T03:13:46Z

The issue is that if we use p instead of logitp as the input to the logp and logcdf we necessarily loose numerical precision. This was not really the case for the other alternative parametrizations that we already refactored but is relevant in here.

ricardoV94 · 2021-04-08T04:02:24Z

Alternatively if we could rely on Aesara always simplifying the graph logit(invlogit(x)) = x we could do p = invlogit(logit_p) during initialization and logit_p=logit(p) inside the logp and logcdf calls for the same end result without having to alter the Bernoulli op.

I have to check if aesara can actually do this one. If it can / could, would this be a viable strategy?

Taking one step back, is there an alternative where we could simply provide different parametrizations to the random op and the logp/ logcdf methods directly? Feels inefficient to have to do so often x -> f(x) -> f^-1(f(x)) because the random op is (logically) parametrized differently than the logp method.

ricardoV94 · 2021-04-08T15:22:32Z

I wrote a small gist that enables rewrites of logit(invlogit(x)) = x and invlogit(logit(x)) = x.

Whether or not we make use of this for this PR, it could be a nice feature going into the future. Users would benefit from increased precision in logistic regression models without having to be aware of the alternative logit_p parametrization (assuming we decided to stick with Bernoulli logp/logcdf methods written in logit scale).

We probably could also get rid of the "hackish" invlogit function in pymc3.math and revert to using the Aesara sigmoid:
https://github.com/pymc-devs/pymc3/blob/2dee9dafb3541043ae23b66fe9262e70fbd2769a/pymc3/math.py#L212-L214

brandonwillard · 2021-04-11T17:39:03Z

I wrote a small gist that enables rewrites of logit(invlogit(x)) = x and invlogit(logit(x)) = x.

Sounds good; create a PR in Aesara for that.

We probably could also get rid of the "hackish" invlogit function in pymc3.math and revert to using the Aesara sigmoid

Yeah, we should make this change, too.

ricardoV94 · 2021-04-12T04:56:17Z

@brandonwillard I will check the sigmoid in more detail and open a PR on the Aesara side.

Besides that, is there no way we could specify a mapping of which parameters should go into a RandomOp, vs the logp or logcdf when creating a PyMC distribution?

It feels rather restrictive to force everything into the RandomOp parametrization, when different methods have more natural / stable parametrizations.

Class Bernoulli(Discrete):
  def __init__(self, p=None, logitp=None, *args, **kwargs):
     p, logitp = self.get_p_logitp(p=p, logitp=logitp)
     param_map = {
       "random": (p,),
       "logp": (logitp,),
       "logcdf": (logitp,),
     }
     super().__init__(param_map=param_map, **kwargs)

Which by default would obviously do nothing and use the same arguments for all 3 methods.

brandonwillard · 2021-04-12T06:04:56Z

It feels rather restrictive to force everything into the RandomOp parametrization, when different methods have more natural / stable parametrizations.

If you're talking about developer "restrictions", then it's considerably more restrictive to have to deal with duplicate classes that vary only in how they parameterize something. The downstream complexity introduced by such an outgrown is considerably larger than you realize.

If you're talking about the parameterizations used during sampling, we can always consider changing the internal parameterizations used in the Aesara NumPy RandomVariables when there's a demonstrable improvement to be had. Even so, we definitely can't remove the NumPy parameterizations from the interface.

Aside from those, I'm not sure what you mean, because any transforms one may want to apply in a log-likelihood or CDF graph can be applied quite easily.

The change you proposed in this PR doesn't seem to line up with your example, though, because the code you posted implies that the sampling (i.e. random) should be performed in the existing parameterization (i.e. p) and not the logit-based parameterization change in this PR. Again, it's very straightforward to make an arbitrary parameterization change within a logp implementation, and there is absolutely no reason to change the parameterization used by the underlying RandomVariable—unless it provides a substantial improvement to random sampling, of course.

ricardoV94 · 2021-04-12T06:43:45Z

If you're talking about developer "restrictions", then it's considerably more restrictive to have to deal with duplicate classes that vary only in how they parameterize something. The downstream complexity introduced by such an outgrown is considerably larger than you realize.

Definitely, I wouldn't want to do that. To be clear I don't like the current PR proposal.

If you're talking about the parameterizations used during sampling, we can always consider changing the internal parameterizations used in the Aesara NumPy RandomVariables when there's a demonstrable improvement to be had. Even so, we definitely can't remove the NumPy parameterizations from the interface.

Again, that sounds suboptimal. Ideally we could keep the RandomVariable parametrization aligned to Numpy's / Scipy's, while allowing for different parametrizations for the logp / logcdf... but only transforming between the different parametrizations where needed. If a user defines p, we would use that directly for the RandomOp, and convert to logitp for the logp/ logcdf, but if the user defines a logitp we would do the reverse: use it directly on logp/logcdf and only convert to p for the RandomOp.

Aside from those, I'm not sure what you mean, because any transforms one may want to apply in a log-likelihood or CDF graph can be applied quite easily.

The change you proposed in this PR doesn't seem to line up with your example, though, because the code you posted implies that the sampling (i.e. random) should be performed in the existing parameterization (i.e. p) and not the logit-based parameterization change in this PR. Again, it's very straightforward to make an arbitrary parameterization change within a logp implementation, and there is absolutely no reason to change the parameterization used by the underlying RandomVariable—unless it provides a substantial improvement to random sampling, of course.

The problem is one of numerical stability / efficiency. If the user computes logitp upstream of the bernoulli as is usual in a logistic regression, we are forcing it to go through logit -> invlogit -> logit to arrive at the logp part, which feels wasteful both in terms of computations and numerical precision. It's nice that we can add automatic rewrites on Aesara to optimize away this kind of inverse transformations, but ideally we would not have to do this for every alternative parametrization where there might be a loss of precision in f^-1(f(x)).

To make it clear, if we converted to p during initialization and then back to logit_p inside the logp this is what would happen now:

import aesara
import aesara.tensor as at

invlogit = at.nnet.sigmoid
logit = lambda p: at.log(p / (1-p)

f = aesara.function([x], logit(invlogit(x)))
print(f([-750, -50, -5, 5, 50, 750]))

# [-inf -50.  -5.   5.  inf  inf]

brandonwillard · 2021-04-15T17:28:42Z

It's nice that we can add automatic rewrites on Aesara to optimize away this kind of inverse transformations...

That's one of the important reasons for having the kind of rewrites that we do; it focuses such concerns into a place and time that's more universally manageable. It reduces code complexity by removing the need for a complicated patchwork of numerical concerns implemented in random places throughout the codebase.

Your approach mixes the design of our distribution/random variable interfaces with numerical concerns that arise in different parts of the codebase (e.g. in only the log-likelihood calculations), and this is the exact kind of coupling and complexity I was talking about before.

The rewrites have very little overhead, especially since they do nothing other than check a few conditions/types and create new Apply nodes, both of which are dirt cheap.

Also, by not having those rewrites in place, we end up suffering the same numerical instabilities anyway (just in other places), so why would we not implement those and use them to simplify our classes at the same time?

Refactor Bernoulli in terms of logitp

4e6d62b

ricardoV94 commented Apr 7, 2021

View reviewed changes

michaelosthege added the v4 label Apr 7, 2021

brandonwillard suggested changes Apr 7, 2021

View reviewed changes

ricardoV94 closed this Jun 14, 2021

ricardoV94 mentioned this pull request Jul 21, 2021

Register log1p_neg_sigmoid rewrite in specialize aesara-devs/aesara#531

Merged

ricardoV94 deleted the bernoulli_logitp branch September 23, 2021 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reintroduce Bernoulli logitp parametrization #4620

Reintroduce Bernoulli logitp parametrization #4620

Uh oh!

ricardoV94 commented Apr 7, 2021 •

edited

Loading

Uh oh!

ricardoV94 Apr 7, 2021

Uh oh!

ricardoV94 Apr 7, 2021

Uh oh!

ricardoV94 Apr 7, 2021

Uh oh!

brandonwillard left a comment •

edited

Loading

Uh oh!

ricardoV94 commented Apr 8, 2021

Uh oh!

ricardoV94 commented Apr 8, 2021 •

edited

Loading

Uh oh!

ricardoV94 commented Apr 8, 2021

Uh oh!

brandonwillard commented Apr 11, 2021

Uh oh!

ricardoV94 commented Apr 12, 2021 •

edited

Loading

Uh oh!

brandonwillard commented Apr 12, 2021

Uh oh!

ricardoV94 commented Apr 12, 2021 •

edited

Loading

Uh oh!

brandonwillard commented Apr 15, 2021 •

edited

Loading

Uh oh!

Uh oh!

		@@ -332,6 +339,19 @@ def logcdf(self, value):
		)


		class BernoulliLogitRV(BernoulliRV):

Uh oh!

Reintroduce Bernoulli logitp parametrization #4620

Reintroduce Bernoulli logitp parametrization #4620

Uh oh!

Conversation

ricardoV94 commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

brandonwillard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Apr 8, 2021

Uh oh!

ricardoV94 commented Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Apr 8, 2021

Uh oh!

brandonwillard commented Apr 11, 2021

Uh oh!

ricardoV94 commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Apr 12, 2021

Uh oh!

ricardoV94 commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ricardoV94 commented Apr 7, 2021 •

edited

Loading

brandonwillard left a comment •

edited

Loading

ricardoV94 commented Apr 8, 2021 •

edited

Loading

ricardoV94 commented Apr 12, 2021 •

edited

Loading

ricardoV94 commented Apr 12, 2021 •

edited

Loading

brandonwillard commented Apr 15, 2021 •

edited

Loading