-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Kronecker GP #2731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kronecker GP #2731
Conversation
pymc3/distributions/multivariate.py
Outdated
|
||
|
||
__all__ = ['MvNormal', 'MvStudentT', 'Dirichlet', | ||
'Multinomial', 'Wishart', 'WishartBartlett', | ||
'LKJCorr', 'LKJCholeskyCov', 'MatrixNormal'] | ||
'LKJCorr', 'LKJCholeskyCov', 'MatrixNormal', 'KroneckerNormal'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new-line
pymc3/distributions/dist_math.py
Outdated
@@ -346,3 +349,134 @@ def grad(self, inputs, grads): | |||
x_grad, = grads | |||
|
|||
return [x_grad * self.grad_op(x)] | |||
|
|||
|
|||
class Eigh(tt.nlinalg.Eig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class here to fix the njobs
issue mentioned on the discourse thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems to be working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a theano PR for this? Would be good to link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just started one here.
pymc3/gp/gp.py
Outdated
Returns the marginal likelihood distribution, given the input | ||
locations `X` and the data `y`. | ||
""" | ||
# if not isinstance(noise, Covariance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to include some additional basic argument checking here, like whether the Xs are in a list or tuple, and that the length is the same as the list of cov_funcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does list vs tuple matter? And yes, checking lengths is a good idea.
pymc3/gp/gp.py
Outdated
return pm.KroneckerNormal(name, mu=mu, covs=covs, noise=noise, | ||
observed=y, **kwargs) | ||
else: | ||
# shape = infer_shape(Xs, kwargs.pop("shape", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may need to have shape be a list of shapes for each X?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the product of the shapes would do, what do you think?
pymc3/gp/gp.py
Outdated
mean_total = self.mean_func | ||
if all(val in given for val in ['X', 'y', 'noise']): | ||
X, y, noise = given['X'], given['y'], given['noise'] | ||
# if not isinstance(noise, Covariance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since noise
for the Kronecker GP marginal model with Eigenvalue decomp is only the white noise term on the whole diagonal, it should probably follow the same syntax as gp.*Sparse
implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change it to sigma to remain consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I use sigma
in KroneckerNormal
as well?
return np.stack(np.meshgrid(*arrays, indexing='ij'), -1).reshape(-1, N) | ||
|
||
|
||
def kron_matrix_op(krons, m, op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is really nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Getting it working in Theano was much tricker than Numpy. If there are more efficient ways than scan
and the nested functions that would be great to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I profiled model.logp
and the gradient, and the scan here is the slowest part (~50% of computation time). I'm ok with saving this optimization for another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks for looking into that! Would certainly like to check that out in some future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heres where I was looking at that, down towards the bottom. You can run the theano profiler on pymc3 code. You can see that scan at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I am trying to use KroneckerNormal directly (not through the GP interface) but unfortunately it seems prohibitively slow for my application (though it could be something I'm doing wrong, not a theano expert by any means). My profiling also implicated scan for ~50% of the computation time, but overall it was much slower than @bwengals example (I think ~267 seconds for 1000 evaluations of the likelihood). I tried it with data that is 1000 by 2, flattened; I define one covariance matrix (1000 by 1000) using a squared exponential covariance function that operates on a 5-dimensional input, and the second (2 by 2) is unconstrained and given an LKJChol prior. I'm trying to do MAP estimation and it takes a very long time. I was doing it through MatrixNormal before and it was pretty fast, but I wanted to be able to put additive noise (and was happy to find this is implemented here!). Unfortunately I can't share my code right now, but I could concoct a toy example later to share if that is helpful. Thanks for any input you all might have -- I simply am curious whether this is inherent to current implementation due to theano stuff, whether I might be doing something wrong, or whether the nature of what I'm doing is sufficiently different from the single-input GP examples that it's slowing it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalieklein the easiest way to get some support on this is probably to post it on our discourse site along with a complete example that demonstrates the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! If anyone is interested, here's the discourse link: https://discourse.pymc.io/t/kroneckernormal-speed/1497
This is awesome! And everything looks really good. Have you used this in your research yet? How is it? |
pymc3/distributions/multivariate.py
Outdated
# elif isinstance(shape, tuple): | ||
# shape = *shape, self.N.eval() | ||
# else: | ||
# shape = self.N.eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can all the above be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll get rid of it.
I haven't used this in research yet, I'm still working out tests and bugs. Right now |
---------- | ||
Ks: 2D array-like | ||
""" | ||
return reduce(tt.slinalg.kron, Ks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt reduce
only available in python 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks it's available in Python 2 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it return list or generator in Python 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither. It returns a theano tensor object that is the result of the nested evaluations kron(kron(...kron(kron(Ks[0], Ks[1]), Ks[2]), ...), Ks[d])
, that is, the Kronecker product of all the arrays in order.
pymc3/gp/gp.py
Outdated
@@ -793,7 +794,7 @@ def conditional(self, name, Xnew, pred_noise=False, given=None, **kwargs): | |||
return pm.MvNormal(name, mu=mu, chol=chol, shape=shape, **kwargs) | |||
|
|||
|
|||
@conditioned_vars(["Xs", "y", "noise"]) | |||
@conditioned_vars(["Xs", "y", "sigma"]) | |||
class MarginalKron: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you didn't subclass the base class (even though it doesn't do much)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only because at one point I had not implemented a cov_func
, but instead only had a list of cov_funcs
. Now that I've made one by multiplying the separate ones together, I guess there wouldn't be any harm in subclassing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
pymc3/gp/gp.py
Outdated
else: | ||
Asq = tt.dot(A.T, A) | ||
cov = Km - Asq | ||
if pred_noise: | ||
cov += noise*np.eye(cov.shape) | ||
cov += sigma * np.eye(cov.shape) | ||
return mu, cov | ||
|
||
def conditional(self, name, Xnew, pred_noise=False, given=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding a predict
method? Similar to the Marginal
class. The predict method is there as a convenience when using the MAP point to do predictions. It should be pretty trivial to add since conditional
is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's on its way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
pymc3/gp/gp.py
Outdated
|
||
# New points | ||
Km = self.total_cov_func(Xnew, diag) | ||
Knm = self.total_cov_func(cartesian(*Xs), Xnew) | ||
Km = cov_total(Xnew, diag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cov_total
is here to support making additive models, so that a component GP can be used in isolation to make predictions (described here). It may not be straightforward to support this for Kronecker GPs, in which case it would be good to implement an __add__
method that raises an error, like in the TP
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the sum of Kronecker-structured covariance matrices does not in general have a Kronecker structure, so this class wouldn't be closed under addition. In fact, the conditional
method doesn't yield a KroneckerNormal
either, so technically that could be inherited from Marginal
if we didn't care about the speedups from _build_conditional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I've seen this structure once in a paper, but definitely out of the scope of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting paper! It's too bad that the speedups only hold for the sum of two terms.
This is a huge contribution. Let us know when you're ready for review! |
I can't think of anything else to add at the moment, so review away! |
There are some test fail. |
@@ -1292,3 +1295,221 @@ def logp(self, value): | |||
n = self.n | |||
norm = - 0.5 * m * n * pm.floatX(np.log(2 * np.pi)) | |||
return norm - 0.5*trquaddist - m*half_collogdet - n*half_rowlogdet | |||
|
|||
|
|||
class KroneckerNormal(Continuous): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This supersedes MatrixNormal right? If so, it might be good to change MatrixNormal to just be an alias for KroneckerNormal, to reduce code duplication. If it happens it should probably be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does in a sense. MatrixNormal RVs are matrices, whereas KroneckerNormal is flattened as in MvNormal. I'm open to reusing KroneckerNormal code but either reshaping the output or changing the way MatrixNormal works. Depends what you all think is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes that is true. I'm perfectly OK with leaving things as is.
index2 = index.T | ||
else: | ||
index2 = tt.cast(Xs, 'int32').T | ||
return self.B[index, index2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires that X
has input_dim=1
? If so, should specify in docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, or at least that len(active_dims) == 1
. I added this to the notes in the docstring. Thanks!
pymc3/gp/gp.py
Outdated
super(MarginalKron, self).__init__(mean_func, cov_func) | ||
|
||
def __add__(self, other): | ||
raise TypeError("Kronecker-structured processes aren't additive") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like "Efficient implementation of additive, Kronecker-structured processes not implemented". It's a bit pedantic... you can do additive modeling with them in pymc3, but without exploiting Kronecker properties.
) | ||
|
||
|
||
def pymc3_random(dist, paramdomains, ref_rand, valuedomain=Domain([0]), | ||
size=10000, alpha=0.05, fails=10, extra_args=None): | ||
size=10000, alpha=0.05, fails=10, extra_args=None, | ||
model_args={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having model_args={}
is making travis unhappy, because of this: https://stackoverflow.com/questions/1367883/python-methods-default-parameter-values-are-evaluated-once
Better to do model_args=None
and give it an empty dict if it is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing occurrences of this will get rid of the little (!) that shows up next to the travis builds
pymc3/tests/test_gp.py
Outdated
X = cartesian(X1, X21, X22) | ||
M = np.array([[1, 2, 3], [2, 1, 2], [3, 2, 1]]) | ||
with pm.Model() as model: | ||
# cov1 = 3 + pm.gp.cov.ExpQuad(1, 0.1) + M * pm.gp.cov.ExpQuad(1, 0.1) * M * pm.gp.cov.ExpQuad(1, 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should remove this and M
if unused
pymc3/gp/gp.py
Outdated
fcond = gp.conditional("fcond", Xnew=Xnew) | ||
""" | ||
|
||
def __init__(self, mean_func=Zero(), cov_funcs=[Constant(0.0)]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can change default to (Constant(0.0))
to get rid of potential issues with mutable defaults
pymc3/tests/test_distributions.py
Outdated
class TestMatchesScipy(SeededTest): | ||
def pymc3_matches_scipy(self, pymc3_dist, domain, paramdomains, scipy_dist, | ||
decimal=None, extra_args=None): | ||
decimal=None, extra_args=None, scipy_args={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
occurrence of mutable default
I don't know what the error some of the travis build is about, it doesn't look related to your PR, and isn't occurring for me locally, or in other branches. I think it is OK to just up the tolerance. Changing this line to
should do the trick. The other travis errors look like they are the linter complaining about mutable default arguments. I left a comment where these are in your code. |
Tests are passing and coverage looks great! Do you happen to have a runnable example of |
673f10b
to
8dc9b36
Compare
Rebase needed... |
If it's ok with the other devs, I'd recommend just opening a new PR from the current master with your changes and a long commit message and we can merge that. But my git-fu is like 3/10 |
Sorry everyone.. this all stemmed from rebasing my own commits into a more coherent fashion, which did seem to work, but affected other commits in the process. I could rebase again and remove all commits that are not mine if that would fix the problem. I'm just wary of doing anything in fear of breaking more stuff. |
Would you like help making a clean branch? I would literally clone a new version of pymc3 somewhere else on the file system, create a new branch, then |
8dc9b36
to
ece6ee8
Compare
Looks like we're back in business 😎 a |
@junpenglao Yes I think merging this first is the best approach. Shouldn't be much trouble applying the changes to this. |
It would be nice to see a short example of usage in one of the GP notebooks. |
thanks @jordan-melendez! |
yes thanks @jordan-melendez, this is awesome! |
Nice work @jordan-melendez ! |
Added the required classes for building a GP that takes advantage of a Kronecker-structured covariance matrix. See this discussion. This PR includes
Tests have not been implemented yet, but are coming.