-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Reintroduce Bernoulli logitp parametrization #4620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -332,6 +339,19 @@ def logcdf(self, value): | |||
) | |||
|
|||
|
|||
class BernoulliLogitRV(BernoulliRV): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it inherit from BernoulliRV or RandomVariable directly?
Is the name
and _print_name
change just confusing and not necessary?
def test_bernoulli(self): | ||
pymc3_random_discrete( | ||
pm.Bernoulli, {"p": Unit}, ref_rand=lambda size, p=None: st.bernoulli.rvs(p, size=size) | ||
pm.Bernoulli, {"p": Unit}, ref_rand=lambda size, p: st.bernoulli.rvs(p, size=size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be replaced following #4608
value <= 1, | ||
p >= 0, | ||
p <= 1, | ||
~at.isnan(logit_p), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid p
gets converted to nan
so this is how we can identify it in bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we don't want to create new versions of the basic RandomVariable
s unless we absolutely must; otherwise, putting the parameterization inside the RandomVariable
Op
removes the possibility of optimization and/or stabilization, and necessarily incurs the cost of transforming—even when it's unnecessary.
The real question is: what's the advantage of doing it this way, instead of applying the parameterization as an input to the existing RandomVariable
?
The issue is that if we use |
Alternatively if we could rely on Aesara always simplifying the graph I have to check if aesara can actually do this one. If it can / could, would this be a viable strategy? Taking one step back, is there an alternative where we could simply provide different parametrizations to the random op and the logp/ logcdf methods directly? Feels inefficient to have to do so often |
I wrote a small gist that enables rewrites of Whether or not we make use of this for this PR, it could be a nice feature going into the future. Users would benefit from increased precision in logistic regression models without having to be aware of the alternative We probably could also get rid of the "hackish" |
Sounds good; create a PR in Aesara for that.
Yeah, we should make this change, too. |
@brandonwillard I will check the sigmoid in more detail and open a PR on the Aesara side. Besides that, is there no way we could specify a mapping of which parameters should go into a RandomOp, vs the logp or logcdf when creating a PyMC distribution? It feels rather restrictive to force everything into the RandomOp parametrization, when different methods have more natural / stable parametrizations. Class Bernoulli(Discrete):
def __init__(self, p=None, logitp=None, *args, **kwargs):
p, logitp = self.get_p_logitp(p=p, logitp=logitp)
param_map = {
"random": (p,),
"logp": (logitp,),
"logcdf": (logitp,),
}
super().__init__(param_map=param_map, **kwargs) Which by default would obviously do nothing and use the same arguments for all 3 methods. |
If you're talking about developer "restrictions", then it's considerably more restrictive to have to deal with duplicate classes that vary only in how they parameterize something. The downstream complexity introduced by such an outgrown is considerably larger than you realize. If you're talking about the parameterizations used during sampling, we can always consider changing the internal parameterizations used in the Aesara NumPy Aside from those, I'm not sure what you mean, because any transforms one may want to apply in a log-likelihood or CDF graph can be applied quite easily. The change you proposed in this PR doesn't seem to line up with your example, though, because the code you posted implies that the sampling (i.e. |
Definitely, I wouldn't want to do that. To be clear I don't like the current PR proposal.
Again, that sounds suboptimal. Ideally we could keep the RandomVariable parametrization aligned to Numpy's / Scipy's, while allowing for different parametrizations for the logp / logcdf... but only transforming between the different parametrizations where needed. If a user defines
The problem is one of numerical stability / efficiency. If the user computes logitp upstream of the bernoulli as is usual in a logistic regression, we are forcing it to go through logit -> invlogit -> logit to arrive at the To make it clear, if we converted to import aesara
import aesara.tensor as at
invlogit = at.nnet.sigmoid
logit = lambda p: at.log(p / (1-p)
f = aesara.function([x], logit(invlogit(x)))
print(f([-750, -50, -5, 5, 50, 750]))
# [-inf -50. -5. 5. inf inf] |
That's one of the important reasons for having the kind of rewrites that we do; it focuses such concerns into a place and time that's more universally manageable. It reduces code complexity by removing the need for a complicated patchwork of numerical concerns implemented in random places throughout the codebase. Your approach mixes the design of our distribution/random variable interfaces with numerical concerns that arise in different parts of the codebase (e.g. in only the log-likelihood calculations), and this is the exact kind of coupling and complexity I was talking about before. The rewrites have very little overhead, especially since they do nothing other than check a few conditions/types and create new Also, by not having those rewrites in place, we end up suffering the same numerical instabilities anyway (just in other places), so why would we not implement those and use them to simplify our classes at the same time? |
This is a proposal of how we might reintroduce the
logit_p
parametrization to theBernoulli
distribution.I parametrized everything (sampling, logp, logcdf) in terms of
logit_p
as this should be numerically more stable. This required changing the BernoulliOp from Aesara. I didn't run any benchmarks to see if this would incur a significant loss in speed relative to thep
parametrization.Also, we get a divide by zero
RuntimeWarning
if creating a Bernoulli withp=1
.I am happy to listen if you think there might be a better approach.
Depending on what your PR does, here are a few things you might want to address in the description: