inferencedata.log_likelihood is summing observations #5236

ricardoV94 · 2021-12-03T11:03:38Z

When talking with @lucianopaz I realized we completely broke log_likelihood computation in V4.

import pymc as pm
with pm.Model() as m:
    y = pm.Normal("y")
    x = pm.Normal("x", y, 1, observed=[5, 2])    
    idata = pm.sample(tune=5, draws=5, chains=2)
print(idata.log_likelihood['x'].values.shape)
# (2, 5, 1)

Whereas in V3:

import pymc3 as pm
with pm.Model() as m:
    y = pm.Normal("y")
    x = pm.Normal("x", y, 1, observed=[5, 2])    
    idata = pm.sample(tune=5, draws=5, chains=2, return_inferencedata=True)
print(idata.log_likelihood['x'].values.shape)
# (2, 5, 2)

This happened because the default model.logpt now returns the summed logp by default whereas before it returned the vectorized logp by default. The change was done in 0a172c8

Although that is a more sane default, we have to reintroduce an easy helper logp_elemwiset (I think this is pretty much broken right now as well) which calls logpt with sum=False.

Also in this case we might want to just return the logprob terms as the dictionary items that are returned by aeppl.factorized_joint_lopgrob and let the end-user decide how he wants to combine them. These keys contain {value variable: logp term}. The default of calling at.add on all variables when sum=False is seldom useful (that's why we switched the default), due to potential unwanted broadcasting across variables with different dimensions.

One extra advantage of returning the dictionary items is that we don't need to create nearly duplicated graphs for each observed variable when computing the log-likelihood here:

pymc/pymc/backends/arviz.py

Line 268 in fe2d101

cached = [(var, self.model.fn(logpt(var))) for var in self.model.observed_RVs]

We can request it for any number of observed variables at the same time, and then simply compile a function that has each variable logp term as an output, but otherwise shares the common nodes, saving on compilation, computation and memory footprint, when a model has more than one observed variable.

For instance, this nested loop would no longer be needed:

pymc/pymc/backends/arviz.py

Lines 276 to 282 in fe2d101

    
           for var, log_like_fun in cached: 
        
               for k, chain in enumerate(trace.chains): 
        
                   log_like_chain = [ 
        
                       self.log_likelihood_vals_point(point, var, log_like_fun) 
        
                       for point in trace.points([chain]) 
        
                   ] 
        
                   log_likelihood_dict.insert(var.name, np.stack(log_like_chain), k)

CC @OriolAbril

The text was updated successfully, but these errors were encountered:

OriolAbril · 2021-12-04T00:29:29Z

this nested loop would no longer be needed

That was originally the goal in #4489. My Aesara knowledge was (and still is) very limited so after a while of being stuck @brandonwillard took over and kept the nested loops. It seems like the description already outlies a clear path forward but he might also have extra insight on this.

brandonwillard · 2021-12-04T03:14:30Z

That was originally the goal in #4489. My Aesara knowledge was (and still is) very limited so after a while of being stuck @brandonwillard took over and kept the nested loops. It seems like the description already outlies a clear path forward but he might also have extra insight on this.

A lot has changed since my original v4 branch, so I can't imagine that many/any considerations based on #4489 will be relevant now. Regardless, if the description above is correct, it would appear as though the problem is due to other changes that now require an update to this logic.

Aside from that, @ricardoV94 seems to be proposing some potential paths for improvement. If so, they involve design decisions that need to be considered carefully by the people responsible for making them.

If there are any questions about how the basic machinery works, don't hesitate to ask; otherwise, I don't know how else I can help here.

ricardoV94 added the bug label Dec 3, 2021

ricardoV94 added this to the v4.0.0-beta1 (vNext) milestone Dec 3, 2021

ricardoV94 added v4 trace-backend Traces and ArviZ stuff labels Dec 3, 2021

ricardoV94 changed the title ~~inferencedata.log_likelihood is accumulated across observations~~ inferencedata.log_likelihood is summing observations Dec 3, 2021

ricardoV94 added the help wanted label Dec 3, 2021

ricardoV94 mentioned this issue Dec 7, 2021

Add model.logp_elemwiset #5245

Merged

twiecki closed this as completed in #5245 Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

inferencedata.log_likelihood is summing observations #5236

inferencedata.log_likelihood is summing observations #5236

ricardoV94 commented Dec 3, 2021 •

edited

Loading

OriolAbril commented Dec 4, 2021 •

edited

Loading

Uh oh!

brandonwillard commented Dec 4, 2021

Uh oh!

Uh oh!

inferencedata.log_likelihood is summing observations #5236

inferencedata.log_likelihood is summing observations #5236

Comments

ricardoV94 commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OriolAbril commented Dec 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Dec 4, 2021

Uh oh!

ricardoV94 commented Dec 3, 2021 •

edited

Loading

OriolAbril commented Dec 4, 2021 •

edited

Loading