Skip to content

Commit 1202cb5

Browse files
Bayesian ab testing for PyMC v4 (#351)
* Making all pm.sample return_inferencedata=False when needed * pre commit correction * Re-run bayesian_ab_testing.ipynb notebook Co-authored-by: Sev <[email protected]>
1 parent f2409c3 commit 1202cb5

File tree

2 files changed

+712
-485
lines changed

2 files changed

+712
-485
lines changed

examples/case_studies/bayesian_ab_testing.ipynb

+676-448
Large diffs are not rendered by default.

myst_nbs/case_studies/bayesian_ab_testing.myst.md

+36-37
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,9 @@ import arviz as az
1919
import matplotlib.pyplot as plt
2020
import numpy as np
2121
import pandas as pd
22-
import pymc3 as pm
23-
import pymc3.math as pmm
22+
import pymc as pm
2423
2524
from scipy.stats import bernoulli, expon
26-
27-
print(f"Running on PyMC3 v{pm.__version__}")
2825
```
2926

3027
```{code-cell} ipython3
@@ -33,6 +30,12 @@ rng = np.random.default_rng(RANDOM_SEED)
3330
3431
%config InlineBackend.figure_format = 'retina'
3532
az.style.use("arviz-darkgrid")
33+
34+
plotting_defaults = dict(
35+
bins=50,
36+
kind="hist",
37+
textsize=10,
38+
)
3639
```
3740

3841
This notebook demonstrates how to implement a Bayesian analysis of an A/B test. We implement the models discussed in VWO's [Bayesian A/B Testing Whitepaper](https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf), and discuss the effect of different prior choices for these models. This notebook does _not_ discuss other related topics like how to choose a prior, early stopping, and power analysis.
@@ -76,13 +79,13 @@ $$y_A \sim \mathrm{Binomial}(n = N_A, p = \theta_A), y_B \sim \mathrm{Binomial}(
7679

7780
With this, we can sample from the joint posterior of $\theta_A, \theta_B$.
7881

79-
You may have noticed that the Beta distribution is the conjugate prior for the Binomial, so we don't need MCMC sampling to estimate the posterior (the exact solution can be found in the VWO paper). We'll still demonstrate how sampling can be done with PyMC3 though, and doing this makes it easier to extend the model with different priors, dependency assumptions, etc.
82+
You may have noticed that the Beta distribution is the conjugate prior for the Binomial, so we don't need MCMC sampling to estimate the posterior (the exact solution can be found in the VWO paper). We'll still demonstrate how sampling can be done with PyMC though, and doing this makes it easier to extend the model with different priors, dependency assumptions, etc.
8083

8184
Finally, remember that our outcome of interest is whether B is better than A. A common measure in practice for whether B is better than is the _relative uplift in conversion rates_, i.e. the percentage difference of $\theta_B$ over $\theta_A$:
8285

8386
$$\mathrm{reluplift}_B = \theta_B / \theta_A - 1$$
8487

85-
We'll implement this model setup in PyMC3 below.
88+
We'll implement this model setup in PyMC below.
8689

8790
```{code-cell} ipython3
8891
@dataclass
@@ -127,7 +130,7 @@ We illustrate these points with prior predictive checks.
127130

128131
+++
129132

130-
Note that we can pass in arbitrary values for the observed data in these prior predictive checks. PyMC3 will not use that data when sampling from the prior predictive distribution.
133+
Note that we can pass in arbitrary values for the observed data in these prior predictive checks. PyMC will not use that data when sampling from the prior predictive distribution.
131134

132135
```{code-cell} ipython3
133136
weak_prior = ConversionModelTwoVariant(BetaPrior(alpha=100, beta=100))
@@ -139,22 +142,22 @@ strong_prior = ConversionModelTwoVariant(BetaPrior(alpha=10000, beta=10000))
139142

140143
```{code-cell} ipython3
141144
with weak_prior.create_model(data=[BinomialData(1, 1), BinomialData(1, 1)]):
142-
weak_prior_predictive = pm.sample_prior_predictive(samples=10000)
145+
weak_prior_predictive = pm.sample_prior_predictive(samples=10000, return_inferencedata=False)
143146
```
144147

145148
```{code-cell} ipython3
146149
with strong_prior.create_model(data=[BinomialData(1, 1), BinomialData(1, 1)]):
147-
strong_prior_predictive = pm.sample_prior_predictive(samples=10000)
150+
strong_prior_predictive = pm.sample_prior_predictive(samples=10000, return_inferencedata=False)
148151
```
149152

150153
```{code-cell} ipython3
151154
fig, axs = plt.subplots(2, 1, figsize=(7, 7), sharex=True)
152-
az.plot_posterior(weak_prior_predictive["reluplift_b"], textsize=10, ax=axs[0], kind="hist")
155+
az.plot_posterior(weak_prior_predictive["reluplift_b"], ax=axs[0], **plotting_defaults)
153156
axs[0].set_title(f"B vs. A Rel Uplift Prior Predictive, {weak_prior.priors}", fontsize=10)
154157
axs[0].axvline(x=0, color="red")
155-
az.plot_posterior(strong_prior_predictive["reluplift_b"], textsize=10, ax=axs[1], kind="hist")
158+
az.plot_posterior(strong_prior_predictive["reluplift_b"], ax=axs[1], **plotting_defaults)
156159
axs[1].set_title(f"B vs. A Rel Uplift Prior Predictive, {strong_prior.priors}", fontsize=10)
157-
axs[1].axvline(x=0, color="red")
160+
axs[1].axvline(x=0, color="red");
158161
```
159162

160163
With the weak prior our 94% HDI for the relative uplift for B over A is roughly [-20%, +20%], whereas it is roughly [-2%, +2%] with the strong prior. This is effectively the "starting point" for the relative uplift distribution, and will affect how the observed conversions translate to the posterior distribution.
@@ -204,26 +207,27 @@ def run_scenario_twovariant(
204207
generated = generate_binomial_data(variants, true_rates, samples_per_variant)
205208
data = [BinomialData(**generated[v].to_dict()) for v in variants]
206209
with ConversionModelTwoVariant(priors=weak_prior).create_model(data):
207-
trace_weak = pm.sample(draws=5000, return_inferencedata=True, cores=1, chains=2)
210+
trace_weak = pm.sample(draws=5000)
208211
with ConversionModelTwoVariant(priors=strong_prior).create_model(data):
209-
trace_strong = pm.sample(draws=5000, return_inferencedata=True, cores=1, chains=2)
212+
trace_strong = pm.sample(draws=5000)
210213
211214
true_rel_uplift = true_rates[1] / true_rates[0] - 1
212215
213216
fig, axs = plt.subplots(2, 1, figsize=(7, 7), sharex=True)
214-
az.plot_posterior(trace_weak.posterior["reluplift_b"], textsize=10, ax=axs[0], kind="hist")
217+
az.plot_posterior(trace_weak.posterior["reluplift_b"], ax=axs[0], **plotting_defaults)
215218
axs[0].set_title(f"True Rel Uplift = {true_rel_uplift:.1%}, {weak_prior}", fontsize=10)
216219
axs[0].axvline(x=0, color="red")
217-
az.plot_posterior(trace_strong.posterior["reluplift_b"], textsize=10, ax=axs[1], kind="hist")
220+
az.plot_posterior(trace_strong.posterior["reluplift_b"], ax=axs[1], **plotting_defaults)
218221
axs[1].set_title(f"True Rel Uplift = {true_rel_uplift:.1%}, {strong_prior}", fontsize=10)
219222
axs[1].axvline(x=0, color="red")
220223
fig.suptitle("B vs. A Rel Uplift")
224+
return trace_weak, trace_strong
221225
```
222226

223227
#### Scenario 1 - same underlying conversion rates
224228

225229
```{code-cell} ipython3
226-
run_scenario_twovariant(
230+
trace_weak, trace_strong = run_scenario_twovariant(
227231
variants=["A", "B"],
228232
true_rates=[0.23, 0.23],
229233
samples_per_variant=100000,
@@ -290,7 +294,7 @@ class ConversionModel:
290294
elif comparison_method == "best_of_rest":
291295
others = [p[j] for j in range(num_variants) if j != i]
292296
if len(others) > 1:
293-
comparison = pmm.maximum(*others)
297+
comparison = pm.math.maximum(*others)
294298
else:
295299
comparison = others[0]
296300
else:
@@ -310,17 +314,15 @@ def run_scenario_bernoulli(
310314
generated = generate_binomial_data(variants, true_rates, samples_per_variant)
311315
data = [BinomialData(**generated[v].to_dict()) for v in variants]
312316
with ConversionModel(priors).create_model(data=data, comparison_method=comparison_method):
313-
trace = pm.sample(draws=5000, return_inferencedata=True, chains=2, cores=1)
317+
trace = pm.sample(draws=5000)
314318
315319
n_plots = len(variants)
316320
fig, axs = plt.subplots(nrows=n_plots, ncols=1, figsize=(3 * n_plots, 7), sharex=True)
317321
for i, variant in enumerate(variants):
318322
if i == 0 and comparison_method == "compare_to_control":
319323
axs[i].set_yticks([])
320324
else:
321-
az.plot_posterior(
322-
trace.posterior[f"reluplift_{i}"], textsize=10, ax=axs[i], kind="hist"
323-
)
325+
az.plot_posterior(trace.posterior[f"reluplift_{i}"], ax=axs[i], **plotting_defaults)
324326
axs[i].set_title(f"Rel Uplift {variant}, True Rate = {true_rates[i]:.2%}", fontsize=10)
325327
axs[i].axvline(x=0, color="red")
326328
fig.suptitle(f"Method {comparison_method}, {priors}")
@@ -451,9 +453,9 @@ class RevenueModel:
451453
others_lam = [1 / lam[j] for j in range(num_variants) if j != i]
452454
others_rpv = [revenue_per_visitor[j] for j in range(num_variants) if j != i]
453455
if len(others_rpv) > 1:
454-
comparison_theta = pmm.maximum(*others_theta)
455-
comparison_lam = pmm.maximum(*others_lam)
456-
comparison_rpv = pmm.maximum(*others_rpv)
456+
comparison_theta = pm.math.maximum(*others_theta)
457+
comparison_lam = pm.math.maximum(*others_lam)
458+
comparison_rpv = pm.math.maximum(*others_rpv)
457459
else:
458460
comparison_theta = others_theta[0]
459461
comparison_lam = others_lam[0]
@@ -492,14 +494,14 @@ data = [
492494

493495
```{code-cell} ipython3
494496
with RevenueModel(c_prior, mp_prior).create_model(data, "best_of_rest"):
495-
revenue_prior_predictive = pm.sample_prior_predictive(samples=10000)
497+
revenue_prior_predictive = pm.sample_prior_predictive(samples=10000, return_inferencedata=False)
496498
```
497499

498500
```{code-cell} ipython3
499501
fig, ax = plt.subplots()
500-
az.plot_posterior(revenue_prior_predictive["reluplift_1"], textsize=10, ax=ax, kind="hist")
502+
az.plot_posterior(revenue_prior_predictive["reluplift_1"], ax=ax, **plotting_defaults)
501503
ax.set_title(f"Revenue Rel Uplift Prior Predictive, {c_prior}, {mp_prior}", fontsize=10)
502-
ax.axvline(x=0, color="red")
504+
ax.axvline(x=0, color="red");
503505
```
504506

505507
Similar to the model for Bernoulli conversions, the width of the prior predictive uplift distribution will depend on the strength of our priors. See the Bernoulli conversions section for a discussion of the benefits and disadvantages of using a weak vs. strong prior.
@@ -553,17 +555,15 @@ def run_scenario_value(
553555
with RevenueModel(conversion_rate_prior, mean_purchase_prior).create_model(
554556
data, comparison_method
555557
):
556-
trace = pm.sample(draws=5000, return_inferencedata=True, chains=2, cores=1)
558+
trace = pm.sample(draws=5000, chains=2, cores=1)
557559
558560
n_plots = len(variants)
559561
fig, axs = plt.subplots(nrows=n_plots, ncols=1, figsize=(3 * n_plots, 7), sharex=True)
560562
for i, variant in enumerate(variants):
561563
if i == 0 and comparison_method == "compare_to_control":
562564
axs[i].set_yticks([])
563565
else:
564-
az.plot_posterior(
565-
trace.posterior[f"reluplift_{i}"], textsize=10, ax=axs[i], kind="hist"
566-
)
566+
az.plot_posterior(trace.posterior[f"reluplift_{i}"], ax=axs[i], **plotting_defaults)
567567
true_rpv = true_conversion_rates[i] * true_mean_purchase[i]
568568
axs[i].set_title(f"Rel Uplift {variant}, True RPV = {true_rpv:.2f}", fontsize=10)
569569
axs[i].axvline(x=0, color="red")
@@ -611,8 +611,7 @@ scenario_value_2 = run_scenario_value(
611611
axs = az.plot_posterior(
612612
scenario_value_2,
613613
var_names=["theta_reluplift_1", "reciprocal_lam_reluplift_1"],
614-
textsize=10,
615-
kind="hist",
614+
**plotting_defaults,
616615
)
617616
axs[0].set_title(f"Conversion Rate Uplift B, True Uplift = {(0.04 / 0.05 - 1):.2%}", fontsize=10)
618617
axs[0].axvline(x=0, color="red")
@@ -655,9 +654,9 @@ There are many other considerations to implementing a Bayesian framework to anal
655654
* How do we plan the length and size of A/B tests using power analysis, if we're using Bayesian models to analyse the results?
656655
* Outside of the conversion rates (bernoulli random variables for each visitor), many value distributions in online software cannot be fit with nice densities like Normal, Gamma, etc. How do we model these?
657656

658-
Various textbooks and online resources dive into these areas in more detail. [Doing Bayesian Data Analysis](http://doingbayesiandataanalysis.blogspot.com/) by John Kruschke is a great resource, and has been translated to PyMC3 here: https://github.com/JWarmenhoven/DBDA-python.
657+
Various textbooks and online resources dive into these areas in more detail. [Doing Bayesian Data Analysis](http://doingbayesiandataanalysis.blogspot.com/) by John Kruschke is a great resource, and has been translated to PyMC here: https://github.com/JWarmenhoven/DBDA-python.
659658

660-
We also plan to create more PyMC3 tutorials on these topics, so stay tuned!
659+
We also plan to create more PyMC tutorials on these topics, so stay tuned!
661660

662661
---
663662

@@ -669,5 +668,5 @@ Author: [Cuong Duong](https://github.com/tcuongd) (2021-05-23)
669668

670669
```{code-cell} ipython3
671670
%load_ext watermark
672-
%watermark -n -u -v -iv -w -p theano,xarray
671+
%watermark -n -u -v -iv -w -p aesara,xarray
673672
```

0 commit comments

Comments
 (0)