Skip to content

Commit e153121

Browse files
rowan-schaeferRowan SchaeferOriolAbrilmichaelosthege
authored
Update docstrings for sample_smc and smc.py (#6114)
* fixed docstrings and a reference * trailing whitespace * fixed typo * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/sample_smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Update pymc/smc/smc.py Co-authored-by: Oriol Abril-Pla <[email protected]> * Made changes to smc.py and sample_smc.py for pr#6114 * Fix pre-commit Co-authored-by: Rowan Schaefer <[email protected]> Co-authored-by: Oriol Abril-Pla <[email protected]> Co-authored-by: Michael Osthege <[email protected]>
1 parent 02f1836 commit e153121

File tree

4 files changed

+112
-95
lines changed

4 files changed

+112
-95
lines changed

docs/source/api/smc.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ Sequential Monte Carlo
88

99
sample_smc
1010

11-
(smc_kernels)=
11+
.. _smc_kernels:
12+
1213
SMC kernels
1314
-----------
1415

pymc/smc/kernels.py

Lines changed: 67 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141

4242

4343
class SMC_KERNEL(ABC):
44-
"""Base class for the Sequential Monte Carlo kernels
44+
"""Base class for the Sequential Monte Carlo kernels.
4545
4646
To create a new SMC kernel you should subclass from this.
4747
@@ -53,73 +53,73 @@ class SMC_KERNEL(ABC):
5353
to sampling from the prior distribution. This method is only called
5454
if `start` is not specified.
5555
56-
_initialize_kernel: default
56+
_initialize_kernel : default
5757
Creates initial population of particles in the variable
5858
`self.tempered_posterior` and populates the `self.var_info` dictionary
5959
with information about model variables shape and size as
60-
{var.name : (var.shape, var.size)
60+
{var.name : (var.shape, var.size)}.
6161
62-
The functions self.prior_logp_func and self.likelihood_logp_func are
62+
The functions `self.prior_logp_func` and `self.likelihood_logp_func` are
6363
created in this step. These expect a 1D numpy array with the summed
6464
sizes of each raveled model variable (in the order specified in
65-
model.inial_point).
65+
:meth:`pymc.Model.initial_point`).
6666
6767
Finally, this method computes the log prior and log likelihood for
68-
the initial particles, and saves them in self.prior_logp and
69-
self.likelihood_logp.
68+
the initial particles, and saves them in `self.prior_logp` and
69+
`self.likelihood_logp`.
7070
7171
This method should not be modified.
7272
73-
setup_kernel: optional
73+
setup_kernel : optional
7474
May include any logic that should be performed before sampling
7575
starts.
7676
7777
During each sampling stage the following methods are called in order:
7878
79-
update_beta_and_weights: default
80-
The inverse temperature self.beta is updated based on the self.likelihood_logp
81-
and `threshold` parameter
79+
update_beta_and_weights : default
80+
The inverse temperature self.beta is updated based on the `self.likelihood_logp`
81+
and `threshold` parameter.
8282
83-
The importance self.weights of each particle are computed from the old and newly
84-
selected inverse temperature
83+
The importance `self.weights` of each particle are computed from the old and newly
84+
selected inverse temperature.
8585
8686
The iteration number stored in `self.iteration` is updated by this method.
8787
88-
Finally the model log_marginal_likelihood of the tempered posterior
89-
is updated from these weights
88+
Finally the model `log_marginal_likelihood` of the tempered posterior
89+
is updated from these weights.
9090
91-
resample: default
92-
The particles in self.posterior are sampled with replacement based
93-
on self.weights, and the used resampling indexes are saved in
91+
resample : default
92+
The particles in `self.posterior` are sampled with replacement based
93+
on `self.weights`, and the used resampling indexes are saved in
9494
`self.resampling_indexes`.
9595
96-
The arrays self.prior_logp, self.likelihood_logp are rearranged according
97-
to the order of the resampled particles. self.tempered_posterior_logp
98-
is computed from these and the current self.beta
96+
The arrays `self.prior_logp` and `self.likelihood_logp` are rearranged according
97+
to the order of the resampled particles. `self.tempered_posterior_logp`
98+
is computed from these and the current `self.beta`.
9999
100-
tune: optional
101-
May include logic that should be performed before every mutation step
100+
tune : optional
101+
May include logic that should be performed before every mutation step.
102102
103-
mutate: REQUIRED
104-
Mutate particles in self.tempered_posterior
103+
mutate : REQUIRED
104+
Mutate particles in `self.tempered_posterior`.
105105
106-
This method is further responsible to update the self.prior_logp,
107-
self.likelihod_logp and self.tempered_posterior_logp, corresponding
108-
to each mutated particle
106+
This method is further responsible to update the `self.prior_logp`,
107+
`self.likelihod_logp` and `self.tempered_posterior_logp`, corresponding
108+
to each mutated particle.
109109
110-
sample_stats: default
110+
sample_stats : default
111111
Returns important sampling_stats at the end of each stage in a dictionary
112-
format. This will be saved in the final InferenceData objcet under `sample_stats`.
112+
format. This will be saved in the final InferenceData object under `sample_stats`.
113113
114114
Finally, at the end of sampling the following methods are called:
115115
116-
_posterior_to_trace: default
116+
_posterior_to_trace : default
117117
Convert final population of particles to a posterior trace object.
118118
This method should not be modified.
119119
120-
sample_settings: default:
120+
sample_settings : default
121121
Returns important sample_settings at the end of sampling in a dictionary
122-
format. This will be saved in the final InferenceData objcet under `sample_stats`.
122+
format. This will be saved in the final InferenceData object under `sample_stats`.
123123
124124
"""
125125

@@ -132,23 +132,29 @@ def __init__(
132132
threshold=0.5,
133133
):
134134
"""
135+
Initialize the SMC_kernel class.
135136
136137
Parameters
137138
----------
138-
draws: int
139-
The number of samples to draw from the posterior (i.e. last stage). And also the number of
139+
draws : int, default 2000
140+
The number of samples to draw from the posterior (i.e. last stage). Also the number of
140141
independent chains. Defaults to 2000.
141-
start: dict, or array of dict
142+
start : dict, or array of dict, default None
142143
Starting point in parameter space. It should be a list of dict with length `chains`.
143144
When None (default) the starting point is sampled from the prior distribution.
144-
model: Model (optional if in ``with`` context)).
145-
random_seed: int
145+
model : Model (optional if in ``with`` context).
146+
random_seed : int, array_like of int, RandomState or Generator, optional
146147
Value used to initialize the random number generator.
147-
threshold: float
148+
threshold : float, default 0.5
148149
Determines the change of beta from stage to stage, i.e.indirectly the number of stages,
149150
the higher the value of `threshold` the higher the number of stages. Defaults to 0.5.
150151
It should be between 0 and 1.
151152
153+
Attributes
154+
----------
155+
self.var_info : dict
156+
Dictionary that contains information about model variables shape and size.
157+
152158
"""
153159

154160
self.draws = draws
@@ -199,7 +205,7 @@ def initialize_population(self) -> Dict[str, np.ndarray]:
199205
return cast(Dict[str, np.ndarray], dict_prior)
200206

201207
def _initialize_kernel(self):
202-
"""Create variables and logp function necessary to run kernel
208+
"""Create variables and logp function necessary to run SMC kernel
203209
204210
This method should not be overwritten. If needed, use `setup_kernel`
205211
instead.
@@ -301,17 +307,17 @@ def mutate(self):
301307
def sample_stats(self) -> Dict:
302308
"""Stats to be saved at the end of each stage
303309
304-
These stats will be saved under `sample_stats` in the final InferenceData.
310+
These stats will be saved under `sample_stats` in the final InferenceData object.
305311
"""
306312
return {
307313
"log_marginal_likelihood": self.log_marginal_likelihood if self.beta == 1 else np.nan,
308314
"beta": self.beta,
309315
}
310316

311317
def sample_settings(self) -> Dict:
312-
"""Kernel settings to be saved once at the end of sampling
318+
"""SMC_kernel settings to be saved once at the end of sampling.
313319
314-
These stats will be saved under `sample_stats` in the final InferenceData.
320+
These stats will be saved under `sample_stats` in the final InferenceData object.
315321
316322
"""
317323
return {
@@ -347,15 +353,19 @@ def _posterior_to_trace(self, chain=0) -> NDArray:
347353

348354

349355
class IMH(SMC_KERNEL):
350-
"""Independent Metropolis-Hastings SMC kernel"""
356+
"""Independent Metropolis-Hastings SMC_kernel"""
351357

352358
def __init__(self, *args, correlation_threshold=0.01, **kwargs):
353359
"""
354360
Parameters
355361
----------
356-
correlation_threshold: float
357-
The lower the value the higher the number of IMH steps computed automatically.
362+
correlation_threshold : float, default 0.01
363+
The lower the value, the higher the number of IMH steps computed automatically.
358364
Defaults to 0.01. It should be between 0 and 1.
365+
**kwargs : dict, optional
366+
Keyword arguments passed to the SMC_kernel. Refer to SMC_kernel documentation for a
367+
list of all possible arguments.
368+
359369
"""
360370
super().__init__(*args, **kwargs)
361371
self.correlation_threshold = correlation_threshold
@@ -449,15 +459,19 @@ def get(self, b):
449459

450460

451461
class MH(SMC_KERNEL):
452-
"""Metropolis-Hastings SMC kernel"""
462+
"""Metropolis-Hastings SMC_kernel"""
453463

454464
def __init__(self, *args, correlation_threshold=0.01, **kwargs):
455465
"""
456466
Parameters
457467
----------
458-
correlation_threshold: float
459-
The lower the value the higher the number of MH steps computed automatically.
468+
correlation_threshold : float, default 0.01
469+
The lower the value, the higher the number of MH steps computed automatically.
460470
Defaults to 0.01. It should be between 0 and 1.
471+
**kwargs : dict, optional
472+
Keyword arguments passed to the SMC_kernel. Refer to SMC_kernel documentation for a
473+
list of all possible arguments.
474+
461475
"""
462476
super().__init__(*args, **kwargs)
463477
self.correlation_threshold = correlation_threshold
@@ -468,7 +482,7 @@ def __init__(self, *args, correlation_threshold=0.01, **kwargs):
468482

469483
def setup_kernel(self):
470484
"""Proposal dist is just a Multivariate Normal with unit identity covariance.
471-
Dimension specific scaling is provided by self.proposal_scales and set in self.tune()
485+
Dimension specific scaling is provided by `self.proposal_scales` and set in `self.tune()`
472486
"""
473487
ndim = self.tempered_posterior.shape[1]
474488
self.proposal_scales = np.full(self.draws, min(1, 2.38**2 / ndim))
@@ -586,11 +600,11 @@ def _logp_forw(point, out_vars, in_vars, shared):
586600
Parameters
587601
----------
588602
out_vars : list
589-
containing :class:`pymc.Distribution` for the output variables
603+
Containing Distribution for the output variables
590604
in_vars : list
591-
containing :class:`pymc.Distribution` for the input variables
605+
Containing Distribution for the input variables
592606
shared : list
593-
containing :class:`aesara.tensor.Tensor` for depended shared data
607+
Containing TensorVariable for depended shared data
594608
"""
595609

596610
# Replace integer inputs with rounded float inputs

pymc/smc/sampling.py

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -56,52 +56,54 @@ def sample_smc(
5656
5757
Parameters
5858
----------
59-
draws: int
59+
draws : int, default 2000
6060
The number of samples to draw from the posterior (i.e. last stage). And also the number of
6161
independent chains. Defaults to 2000.
62-
kernel: SMC Kernel used. Defaults to pm.smc.IMH (Independent Metropolis Hastings)
63-
start: dict, or array of dict
62+
kernel : SMC_kernel, optional
63+
SMC kernel used. Defaults to :class:`pymc.smc.smc.IMH` (Independent Metropolis Hastings)
64+
start : dict or array of dict, optional
6465
Starting point in parameter space. It should be a list of dict with length `chains`.
6566
When None (default) the starting point is sampled from the prior distribution.
66-
model: Model (optional if in ``with`` context)).
67-
random_seed : int, array-like of int, RandomState or Generator, optional
67+
model : Model (optional if in ``with`` context).
68+
random_seed : int, array_like of int, RandomState or numpy_Generator, optional
6869
Random seed(s) used by the sampling steps. If a list, tuple or array of ints
6970
is passed, each entry will be used to seed each chain. A ValueError will be
7071
raised if the length does not match the number of chains.
71-
chains : int
72+
chains : int, optional
7273
The number of chains to sample. Running independent chains is important for some
7374
convergence statistics. If ``None`` (default), then set to either ``cores`` or 2, whichever
7475
is larger.
75-
cores : int
76+
cores : int, default None
7677
The number of chains to run in parallel. If ``None``, set to the number of CPUs in the
7778
system.
78-
compute_convergence_checks : bool
79+
compute_convergence_checks : bool, default True
7980
Whether to compute sampler statistics like ``R hat`` and ``effective_n``.
8081
Defaults to ``True``.
81-
return_inferencedata : bool, default=True
82-
Whether to return the trace as an :class:`arviz:arviz.InferenceData` (True) object or a `MultiTrace` (False)
82+
return_inferencedata : bool, default True
83+
Whether to return the trace as an InferenceData (True) object or a MultiTrace (False).
8384
Defaults to ``True``.
8485
idata_kwargs : dict, optional
85-
Keyword arguments for :func:`pymc.to_inference_data`
86-
progressbar : bool, optional default=True
86+
Keyword arguments for :func:`pymc.to_inference_data`.
87+
progressbar : bool, optional, default True
8788
Whether or not to display a progress bar in the command line.
88-
**kernel_kwargs: keyword arguments passed to the SMC kernel.
89-
The default IMH kernel takes the following keywords:
90-
threshold: float
91-
Determines the change of beta from stage to stage, i.e. indirectly the number of stages,
92-
the higher the value of `threshold` the higher the number of stages. Defaults to 0.5.
93-
It should be between 0 and 1.
94-
correlation_threshold: float
89+
**kernel_kwargs : dict, optional
90+
Keyword arguments passed to the SMC_kernel. The default IMH kernel takes the following keywords:
91+
92+
threshold : float, default 0.5
93+
Determines the change of beta from stage to stage, i.e. indirectly the number of stages,
94+
the higher the value of `threshold` the higher the number of stages. Defaults to 0.5.
95+
It should be between 0 and 1.
96+
correlation_threshold : float, default 0.01
9597
The lower the value the higher the number of MCMC steps computed automatically.
9698
Defaults to 0.01. It should be between 0 and 1.
97-
Keyword arguments for other kernels should be checked in the respective docstrings
99+
Keyword arguments for other kernels should be checked in the respective docstrings.
98100
99101
Notes
100102
-----
101103
SMC works by moving through successive stages. At each stage the inverse temperature
102104
:math:`\beta` is increased a little bit (starting from 0 up to 1). When :math:`\beta` = 0
103-
we have the prior distribution and when :math:`\beta` =1 we have the posterior distribution.
104-
So in more general terms we are always computing samples from a tempered posterior that we can
105+
we have the prior distribution and when :math:`\beta = 1` we have the posterior distribution.
106+
So in more general terms, we are always computing samples from a tempered posterior that we can
105107
write as:
106108
107109
.. math::
@@ -113,12 +115,12 @@ def sample_smc(
113115
1. Initialize :math:`\beta` at zero and stage at zero.
114116
2. Generate N samples :math:`S_{\beta}` from the prior (because when :math `\beta = 0` the
115117
tempered posterior is the prior).
116-
3. Increase :math:`\beta` in order to make the effective sample size equals some predefined
118+
3. Increase :math:`\beta` in order to make the effective sample size equal some predefined
117119
value (we use :math:`Nt`, where :math:`t` is 0.5 by default).
118120
4. Compute a set of N importance weights W. The weights are computed as the ratio of the
119121
likelihoods of a sample at stage i+1 and stage i.
120122
5. Obtain :math:`S_{w}` by re-sampling according to W.
121-
6. Use W to compute the mean and covariance for the proposal distribution, a MVNormal.
123+
6. Use W to compute the mean and covariance for the proposal distribution, a MvNormal.
122124
7. Run N independent MCMC chains, starting each one from a different sample
123125
in :math:`S_{w}`. For the IMH kernel, the mean of the proposal distribution is the
124126
mean of the previous posterior stage and not the current point in parameter space.
@@ -130,15 +132,15 @@ def sample_smc(
130132
131133
References
132134
----------
133-
.. [Minson2013] Minson, S. E. and Simons, M. and Beck, J. L., (2013),
134-
Bayesian inversion for finite fault earthquake source models I- Theory and algorithm.
135-
Geophysical Journal International, 2013, 194(3), pp.1701-1726,
135+
.. [Minson2013] Minson, S. E., Simons, M., and Beck, J. L. (2013).
136+
"Bayesian inversion for finite fault earthquake source models I- Theory and algorithm."
137+
Geophysical Journal International, 2013, 194(3), pp.1701-1726.
136138
`link <https://gji.oxfordjournals.org/content/194/3/1701.full>`__
137139
138-
.. [Ching2007] Ching, J. and Chen, Y. (2007).
139-
Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class
140-
Selection, and Model Averaging. J. Eng. Mech., 10.1061/(ASCE)0733-9399(2007)133:7(816),
141-
816-832. `link <http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399
140+
.. [Ching2007] Ching, J., and Chen, Y. (2007).
141+
"Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class
142+
Selection, and Model Averaging." J. Eng. Mech., 2007, 133(7), pp. 816-832. doi:10.1061/(ASCE)0733-9399(2007)133:7(816).
143+
`link <http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399
142144
%282007%29133:7%28816%29>`__
143145
"""
144146

0 commit comments

Comments
 (0)