Add `step` dimension in sampler stats for compound steps #4602

OriolAbril · 2021-04-02T16:48:30Z

When using a compound step, the acceptance rate of each step function is reported separately. It would be nice to check for this at some point and pass the extra coordinate and dims to ArviZ via idata_kwargs so users can then use idata.sample_stats.accept.sel(step="Metropolis")

Caveats:

This is probably very specific and may not be used much (if at all), so I'd consider this far from a priority.
Before doing that we need to ensure that ArviZ always interprets the acceptance rate correctly and gives it the right name (the one we use in the dims dict), otherwise the info would be ignored. I think right now different things happen depending on HMC-non HMC samplers.
I have never used a compound step, so it's probably better if someone with some knowledge and vision on that can help

This came up in: pymc-devs/pymc-examples#94

The text was updated successfully, but these errors were encountered:

mjhajharia · 2021-04-02T18:31:10Z

This came up in: pymc-devs/pymc-examples#94
the new PR is: pymc-devs/pymc-examples#95

michaelosthege · 2022-01-02T15:33:57Z

I just rediscovered this bug too.
A CompoundStep is used as soon as there are discrete variables, or Metropolis etc. involved.
Already in this simple model, there are >1 steppers ~~and the emitted sampler stats are lost for all but one step method~~:

with pm.Model():
    pm.Normal("a")
    pm.Normal("b")
    pm.sample(step=pm.Metropolis())

+1 for treating it as a dimension. But I don't think it will be a required dimension, because some stats like logp at an iteration are not sampler-specific.

I'm not sure how CompoundStep emits/groups sampler stats of the step methods within, but in BaseTrace.record they arrive as a list of dicts.
Any ideas for a naming convention for the coordinate values? Simplest would be the number in that list.

michaelosthege · 2022-01-02T16:03:07Z

An alternative to a sampler dimension would be to automatically rename the stats that appear multiple times.

Note that BaseTrace.stat_names already drops stats that appear multiple times:

pymc/pymc/backends/base.py

Lines 226 to 234 in 98cc942

    
           @property 
        
           def stat_names(self): 
        
               if self.supports_sampler_stats: 
        
                   names = set() 
        
                   for vars in self.sampler_vars or []: 
        
                       names.update(vars.keys()) 
        
                   return names 
        
               else: 
        
                   return set()

With this example I don't think we can easily introduce a stepper dimension, because stats are not necessarily available for all samplers:

with pm.Model():
    pm.Poisson("a", mu=1)
    pm.Poisson("b", mu=2)
    pm.Normal("c")
    idata = pm.sample()

# CompoundStep
# >CompoundStep
# >>Metropolis: [a]
# >>Metropolis: [b]
# >NUTS: [c]

The resulting stats:

michaelosthege · 2022-01-02T20:35:10Z

🐛 : The ArviZ converter incorrectly drops the chain dimension when nchains==1 and more than one sampler emits a stat:

with pm.Model() as pmodel:
    pm.Normal("a")
    pm.Uniform("b")
    pm.Uniform("c")
    idata = pm.sample(step=pm.Metropolis(), chains=1)

idata.sample_stats.dims["chain"] == 1 # False
idata.sample_stats.dims["draw"] == 1000 # False
idata.sample_stats.dims["accepted_dim_0"] # KeyError

OriolAbril · 2022-01-03T01:26:17Z

With this example I don't think we can easily introduce a stepper dimension, because stats are not necessarily available for all samplers

what are the options on this? having different dimensions for different variables within the same dataset is not a problem. i.e. lp having only chain and draw but accept having chain, draw, sampler is no different from having chain, draw, accept_dim_0.

What would be problematic for the coordinate values would be if there was the possibility of "combinatorics" with sample stats. i.e. out of 4 steppers, 3 use accept, 3 (but not the same) use accepted and only 2 use scaling.

michaelosthege · 2022-01-03T01:51:30Z

In the PyMC world we should only stack sampler stats coming from the same type of step method. Otherwise we could end up trying to stack two stats of the same name, but with different shapes/dtypes.

Generally one could also argue to not stack them at all: They are diagnostics of separate samplers, and will be investigated separately.

If we want to add meaningful dims, we should be able to specify them ahead of time: Step methods should report not only the name and dtype, but also the shape and optionally the dims.
That would also allow for the use of storage backends that preallocate memory--something we're not doing for stats yet.

We should still fix that bug where the dims are mixed up in conversion from nchains=1, nsteppers>1 models.

OriolAbril added the request discussion label Apr 2, 2021

michaelosthege mentioned this issue May 27, 2022

Question/Feature request: Figuring out how often the model/gradient was evaluated by NUTS #5809

Open

michaelosthege mentioned this issue Jan 8, 2023

Refactoring and addition of helpers to handle flat stats #6443

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `step` dimension in sampler stats for compound steps #4602

Add `step` dimension in sampler stats for compound steps #4602

OriolAbril commented Apr 2, 2021

mjhajharia commented Apr 2, 2021 •

edited

Loading

Uh oh!

michaelosthege commented Jan 2, 2022 •

edited

Loading

Uh oh!

michaelosthege commented Jan 2, 2022

Uh oh!

michaelosthege commented Jan 2, 2022

Uh oh!

OriolAbril commented Jan 3, 2022 •

edited

Loading

Uh oh!

michaelosthege commented Jan 3, 2022

Uh oh!

Uh oh!

Add step dimension in sampler stats for compound steps #4602

Add step dimension in sampler stats for compound steps #4602

Comments

OriolAbril commented Apr 2, 2021

mjhajharia commented Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelosthege commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelosthege commented Jan 2, 2022

Uh oh!

michaelosthege commented Jan 2, 2022

Uh oh!

OriolAbril commented Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelosthege commented Jan 3, 2022

Uh oh!

Add `step` dimension in sampler stats for compound steps #4602

Add `step` dimension in sampler stats for compound steps #4602

mjhajharia commented Apr 2, 2021 •

edited

Loading

michaelosthege commented Jan 2, 2022 •

edited

Loading

OriolAbril commented Jan 3, 2022 •

edited

Loading