Skip to content

Ch 1: off by 1 error between pymc2 and 3 #338

Closed
@taylorterry3

Description

@taylorterry3

I absolutely love this project, and my team is now using it as a professional development module. We're using pymc3, and got a little puzzled by the last exercise in Ch 1 which asks you to "consider all instances where tau_samples < 45" as all of our values of tau are less than 45. Comparing the pymc2 and pymc3 versions the max value of tau is 45 in 2 and 44 in 3, with the distributions appearing the same other than being shifted by 1. Changing the >= to a > in the line lambda_ = pm.math.switch(tau >= idx, lambda_1, lambda_2) in the pymc3 version brings them back into alignment.

Both versions state tau like this:

tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data)

n_count_data is 74, so tau can have any integer value 0-74 inclusive. To reason about this it helps to think of the values of tau as representing the boundaries between the days and not the days themselves, so there is always going to be one more than there are days.

The pymc2 version sets up lambda_ like this:

@pm.deterministic
def lambda_(tau=tau, lambda_1=lambda_1, lambda_2=lambda_2):
    out = np.zeros(n_count_data)
    out[:tau] = lambda_1  # lambda before tau is lambda1
    out[tau:] = lambda_2  # lambda after (and including) tau is lambda2
    return out

While pymc3 does this:

with model:
    idx = np.arange(n_count_data) # Index
    lambda_ = pm.math.switch(tau >= idx, lambda_1, lambda_2)

So when tau is zero, the pymc2 version executes out[:0] which returns an empty array. In real world terms, this is the case where all events happened under the lambda_2 regime. pymc3 by contrast executes pm.math.switch(0 >= idx, lambda_1, lambda_2) and returns lambda_1 for the first element of idx because 0 >= 0 is True. The comparison operator needs to be > so that you can evaluate 0 > 0 and get False for that and all other elements of idx for the case where you're always in the lambda_2 regime.

I think I have all this right, but I'm a bit new to all of this so I wanted to lay out the reasoning before I put in a random PR to change a >= to a >. Thanks for reading, and let me know if this all makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions