Skip to content

Kronecker GP #2731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 15, 2018
Merged

Kronecker GP #2731

merged 15 commits into from
Feb 15, 2018

Conversation

jordan-melendez
Copy link
Contributor

@jordan-melendez jordan-melendez commented Nov 23, 2017

Added the required classes for building a GP that takes advantage of a Kronecker-structured covariance matrix. See this discussion. This PR includes

  • Efficient Kronecker operations
  • A KroneckerNormal distribution
  • A MarginalKron GP (LatentKron should be easy to add soon too)
  • A Coregion kernel

Tests have not been implemented yet, but are coming.



__all__ = ['MvNormal', 'MvStudentT', 'Dirichlet',
'Multinomial', 'Wishart', 'WishartBartlett',
'LKJCorr', 'LKJCholeskyCov', 'MatrixNormal']
'LKJCorr', 'LKJCholeskyCov', 'MatrixNormal', 'KroneckerNormal']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new-line

@@ -346,3 +349,134 @@ def grad(self, inputs, grads):
x_grad, = grads

return [x_grad * self.grad_op(x)]


class Eigh(tt.nlinalg.Eig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class here to fix the njobs issue mentioned on the discourse thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems to be working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a theano PR for this? Would be good to link to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started one here.

pymc3/gp/gp.py Outdated
Returns the marginal likelihood distribution, given the input
locations `X` and the data `y`.
"""
# if not isinstance(noise, Covariance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to include some additional basic argument checking here, like whether the Xs are in a list or tuple, and that the length is the same as the list of cov_funcs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does list vs tuple matter? And yes, checking lengths is a good idea.

pymc3/gp/gp.py Outdated
return pm.KroneckerNormal(name, mu=mu, covs=covs, noise=noise,
observed=y, **kwargs)
else:
# shape = infer_shape(Xs, kwargs.pop("shape", None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to have shape be a list of shapes for each X?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the product of the shapes would do, what do you think?

pymc3/gp/gp.py Outdated
mean_total = self.mean_func
if all(val in given for val in ['X', 'y', 'noise']):
X, y, noise = given['X'], given['y'], given['noise']
# if not isinstance(noise, Covariance):
Copy link
Contributor

@bwengals bwengals Nov 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since noise for the Kronecker GP marginal model with Eigenvalue decomp is only the white noise term on the whole diagonal, it should probably follow the same syntax as gp.*Sparse implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change it to sigma to remain consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I use sigma in KroneckerNormal as well?

return np.stack(np.meshgrid(*arrays, indexing='ij'), -1).reshape(-1, N)


def kron_matrix_op(krons, m, op):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Getting it working in Theano was much tricker than Numpy. If there are more efficient ways than scan and the nested functions that would be great to see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I profiled model.logp and the gradient, and the scan here is the slowest part (~50% of computation time). I'm ok with saving this optimization for another PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks for looking into that! Would certainly like to check that out in some future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heres where I was looking at that, down towards the bottom. You can run the theano profiler on pymc3 code. You can see that scan at the top.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I am trying to use KroneckerNormal directly (not through the GP interface) but unfortunately it seems prohibitively slow for my application (though it could be something I'm doing wrong, not a theano expert by any means). My profiling also implicated scan for ~50% of the computation time, but overall it was much slower than @bwengals example (I think ~267 seconds for 1000 evaluations of the likelihood). I tried it with data that is 1000 by 2, flattened; I define one covariance matrix (1000 by 1000) using a squared exponential covariance function that operates on a 5-dimensional input, and the second (2 by 2) is unconstrained and given an LKJChol prior. I'm trying to do MAP estimation and it takes a very long time. I was doing it through MatrixNormal before and it was pretty fast, but I wanted to be able to put additive noise (and was happy to find this is implemented here!). Unfortunately I can't share my code right now, but I could concoct a toy example later to share if that is helpful. Thanks for any input you all might have -- I simply am curious whether this is inherent to current implementation due to theano stuff, whether I might be doing something wrong, or whether the nature of what I'm doing is sufficiently different from the single-input GP examples that it's slowing it down.

Copy link
Member

@fonnesbeck fonnesbeck Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalieklein the easiest way to get some support on this is probably to post it on our discourse site along with a complete example that demonstrates the issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If anyone is interested, here's the discourse link: https://discourse.pymc.io/t/kroneckernormal-speed/1497

@bwengals
Copy link
Contributor

This is awesome! And everything looks really good. Have you used this in your research yet? How is it?

# elif isinstance(shape, tuple):
# shape = *shape, self.N.eval()
# else:
# shape = self.N.eval()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all the above be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll get rid of it.

@jordan-melendez
Copy link
Contributor Author

jordan-melendez commented Nov 26, 2017

I haven't used this in research yet, I'm still working out tests and bugs. Right now KroneckerNormal is tricky to test with the current build_model and pymc3_matches_scipy in test_distributions.py due to the list of (differently shaped) covariance matrices, so I had to hack that part together. Thus the tests in test_distributions_random.py will have to be similarly customized. Any ideas on how to proceed there?

----------
Ks: 2D array-like
"""
return reduce(tt.slinalg.kron, Ks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt reduce only available in python 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks it's available in Python 2 as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it return list or generator in Python 3?

Copy link
Contributor Author

@jordan-melendez jordan-melendez Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither. It returns a theano tensor object that is the result of the nested evaluations kron(kron(...kron(kron(Ks[0], Ks[1]), Ks[2]), ...), Ks[d]), that is, the Kronecker product of all the arrays in order.

pymc3/gp/gp.py Outdated
@@ -793,7 +794,7 @@ def conditional(self, name, Xnew, pred_noise=False, given=None, **kwargs):
return pm.MvNormal(name, mu=mu, chol=chol, shape=shape, **kwargs)


@conditioned_vars(["Xs", "y", "noise"])
@conditioned_vars(["Xs", "y", "sigma"])
class MarginalKron:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you didn't subclass the base class (even though it doesn't do much)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because at one point I had not implemented a cov_func, but instead only had a list of cov_funcs. Now that I've made one by multiplying the separate ones together, I guess there wouldn't be any harm in subclassing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

pymc3/gp/gp.py Outdated
else:
Asq = tt.dot(A.T, A)
cov = Km - Asq
if pred_noise:
cov += noise*np.eye(cov.shape)
cov += sigma * np.eye(cov.shape)
return mu, cov

def conditional(self, name, Xnew, pred_noise=False, given=None, **kwargs):
Copy link
Contributor

@bwengals bwengals Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding a predict method? Similar to the Marginal class. The predict method is there as a convenience when using the MAP point to do predictions. It should be pretty trivial to add since conditional is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's on its way!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

pymc3/gp/gp.py Outdated

# New points
Km = self.total_cov_func(Xnew, diag)
Knm = self.total_cov_func(cartesian(*Xs), Xnew)
Km = cov_total(Xnew, diag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cov_total is here to support making additive models, so that a component GP can be used in isolation to make predictions (described here). It may not be straightforward to support this for Kronecker GPs, in which case it would be good to implement an __add__ method that raises an error, like in the TP class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the sum of Kronecker-structured covariance matrices does not in general have a Kronecker structure, so this class wouldn't be closed under addition. In fact, the conditional method doesn't yield a KroneckerNormal either, so technically that could be inherited from Marginal if we didn't care about the speedups from _build_conditional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I've seen this structure once in a paper, but definitely out of the scope of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting paper! It's too bad that the speedups only hold for the sum of two terms.

@bwengals
Copy link
Contributor

This is a huge contribution. Let us know when you're ready for review!

@jordan-melendez
Copy link
Contributor Author

I can't think of anything else to add at the moment, so review away!

@junpenglao junpenglao modified the milestones: s, 3.4 Jan 22, 2018
@junpenglao junpenglao changed the title [WIP] Kronecker GP Kronecker GP Jan 25, 2018
@junpenglao
Copy link
Member

There are some test fail.

@@ -1292,3 +1295,221 @@ def logp(self, value):
n = self.n
norm = - 0.5 * m * n * pm.floatX(np.log(2 * np.pi))
return norm - 0.5*trquaddist - m*half_collogdet - n*half_rowlogdet


class KroneckerNormal(Continuous):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This supersedes MatrixNormal right? If so, it might be good to change MatrixNormal to just be an alias for KroneckerNormal, to reduce code duplication. If it happens it should probably be a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does in a sense. MatrixNormal RVs are matrices, whereas KroneckerNormal is flattened as in MvNormal. I'm open to reusing KroneckerNormal code but either reshaping the output or changing the way MatrixNormal works. Depends what you all think is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that is true. I'm perfectly OK with leaving things as is.

index2 = index.T
else:
index2 = tt.cast(Xs, 'int32').T
return self.B[index, index2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires that X has input_dim=1? If so, should specify in docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or at least that len(active_dims) == 1. I added this to the notes in the docstring. Thanks!

pymc3/gp/gp.py Outdated
super(MarginalKron, self).__init__(mean_func, cov_func)

def __add__(self, other):
raise TypeError("Kronecker-structured processes aren't additive")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "Efficient implementation of additive, Kronecker-structured processes not implemented". It's a bit pedantic... you can do additive modeling with them in pymc3, but without exploiting Kronecker properties.

)


def pymc3_random(dist, paramdomains, ref_rand, valuedomain=Domain([0]),
size=10000, alpha=0.05, fails=10, extra_args=None):
size=10000, alpha=0.05, fails=10, extra_args=None,
model_args={}):
Copy link
Contributor

@bwengals bwengals Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having model_args={} is making travis unhappy, because of this: https://stackoverflow.com/questions/1367883/python-methods-default-parameter-values-are-evaluated-once

Better to do model_args=None and give it an empty dict if it is None.

Copy link
Contributor

@bwengals bwengals Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing occurrences of this will get rid of the little (!) that shows up next to the travis builds

X = cartesian(X1, X21, X22)
M = np.array([[1, 2, 3], [2, 1, 2], [3, 2, 1]])
with pm.Model() as model:
# cov1 = 3 + pm.gp.cov.ExpQuad(1, 0.1) + M * pm.gp.cov.ExpQuad(1, 0.1) * M * pm.gp.cov.ExpQuad(1, 0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove this and M if unused

pymc3/gp/gp.py Outdated
fcond = gp.conditional("fcond", Xnew=Xnew)
"""

def __init__(self, mean_func=Zero(), cov_funcs=[Constant(0.0)]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can change default to (Constant(0.0)) to get rid of potential issues with mutable defaults

class TestMatchesScipy(SeededTest):
def pymc3_matches_scipy(self, pymc3_dist, domain, paramdomains, scipy_dist,
decimal=None, extra_args=None):
decimal=None, extra_args=None, scipy_args={}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

occurrence of mutable default

@bwengals
Copy link
Contributor

I don't know what the error some of the travis build is about, it doesn't look related to your PR, and isn't occurring for me locally, or in other branches.

I think it is OK to just up the tolerance. Changing this line to

npt.assert_allclose(latent_logp, self.logp, atol=5)

should do the trick. The other travis errors look like they are the linter complaining about mutable default arguments. I left a comment where these are in your code.

@bwengals
Copy link
Contributor

Tests are passing and coverage looks great! Do you happen to have a runnable example of Coregion in action?

@junpenglao
Copy link
Member

Rebase needed...

@bwengals
Copy link
Contributor

bwengals commented Feb 11, 2018

If it's ok with the other devs, I'd recommend just opening a new PR from the current master with your changes and a long commit message and we can merge that. But my git-fu is like 3/10

@jordan-melendez
Copy link
Contributor Author

Sorry everyone.. this all stemmed from rebasing my own commits into a more coherent fashion, which did seem to work, but affected other commits in the process. I could rebase again and remove all commits that are not mine if that would fix the problem. I'm just wary of doing anything in fear of breaking more stuff.

@ColCarroll
Copy link
Member

ColCarroll commented Feb 13, 2018

Would you like help making a clean branch? I would literally clone a new version of pymc3 somewhere else on the file system, create a new branch, then mv all the changed files into the new version.

@jordan-melendez
Copy link
Contributor Author

Looks like we're back in business 😎 a git pull --rebase upstream master cleared things up.

@junpenglao
Copy link
Member

There will be some conflict with #2847 and I think we should coordinate the merge. I suggest merging this first.
@gBokiau it would mean more work on your end, what do you think? You can also split your PR into multiple ones as you suggest.

@gBokiau
Copy link
Contributor

gBokiau commented Feb 14, 2018

@junpenglao Yes I think merging this first is the best approach. Shouldn't be much trouble applying the changes to this.

@fonnesbeck
Copy link
Member

It would be nice to see a short example of usage in one of the GP notebooks.

@junpenglao junpenglao merged commit a3af00d into pymc-devs:master Feb 15, 2018
@junpenglao
Copy link
Member

thanks @jordan-melendez!

@junpenglao junpenglao mentioned this pull request Feb 15, 2018
3 tasks
@bwengals
Copy link
Contributor

yes thanks @jordan-melendez, this is awesome!

@usptact
Copy link
Contributor

usptact commented Feb 20, 2018

Nice work @jordan-melendez !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.