Skip to content

[WIP] logp-caching in Metropolis #2882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

michaelosthege
Copy link
Member

#2881

I'm experimenting with caching the logp(q) from one iteration to the next.
There're probably better ways to do the following two things:

  • caching
  • logp-evaluation (maybe via fastlogp?)

A notebook that demonstrates the effect of caching on sampling noisy target densities:
https://gist.github.com/michaelosthege/656215deb662b23b6bcaa1f4d6f24758

@junpenglao
Copy link
Member

Yeah that is more or less I would do it: ie, evaluate self.logp_q0 = logp(q0) at the init, and then alter the state of self.logp_q0 in astep depending on accepted or not.
In that case, maybe there is no need for a delta_logp function anymore?

@ColCarroll
Copy link
Member

This looks like a good idea, but maybe increases the complexity of the code a bit. I have a two suggestions below for making it less "ad hoc", though a third option is to add a comment above the code block, preferably with a reference.

  1. If the ideal acceptance rate for Metropolis-Hastings is ~0.23 (ref), does it make sense to always cache? I do not have an intuition for whether the cache check is expensive compared to evaluating the theano function.

  2. We might also do something clever with an lru cache with maxsize=2 (since it is a markov chain!) Specifically, I guess we'd decorate self.logp. I am not sure how that interacts with theano, or with multiprocessing, though I think in both cases it definitely won't do the wrong thing.

Also: The delta_logp function was intended as an optimization, so I would at least do some benchmarking to make sure removing it doesn't slow things down.

@junpenglao
Copy link
Member

The delta_logp function was intended as an optimization

That's what I thought as well, as it should be optimized as if it only evaluates the logp once. But if the optimization does not work (e.g., calling an outside function), caching should indeed improve the speed.

@michaelosthege
Copy link
Member Author

The lru_cache decorator looks cool, however we can't use it here: It uses a dict and requires the arguments to be hashable. I tried hashing before, but q0 is a non-hashable numpy array.. 🙄

If one compares the performance of once delta_logp to twice logp it is indeed faster. But yeah, markov chain..

However, I think there might be a problem with component sampling. q0 and q are just the dimensions that the sampler is assigned to, right? So what happens to the comparability of q and q0 if there's a change in another dimension?? 😐

@junpenglao
Copy link
Member

However, I think there might be a problem with component sampling. q0 and q are just the dimensions that the sampler is assigned to, right? So what happens to the comparability of q and q0 if there's a change in another dimension??

The astep of each sampler in Compound.step modifies the element q0 that it concerns, but the logp being evaluate is the whole model logp, Compound is a Gibbs sampler wrapper essentially.
That's why some sampler is incorrect (eg #2879) if it only evaluate part of the logp

@michaelosthege
Copy link
Member Author

I'm closing this for now - see #2881 for the reason
tl;dr: too dangerous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants