-
Notifications
You must be signed in to change notification settings - Fork 133
Subtensor RV lift does not work correctly with Categorical RVs #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I maybe spoke too soon. 'local_subtensor_rv_lift' is where the |
pm.Categorical
is inferred from the data shape when observed
has missing values pm.Categorical
are split into observed and missing when observed
has missing values
As you first guessed, the issue is in the local_subtensor_rv_lift which naively tries to split the p parameter as if it had ndim_params=0 (ie. one entry per independent dimension). The splitting should ignore the last dimension, so if you have p=[[.5, .5], [.25, .75]] you split into 2 rvs, one with p=[.5, .5] and another with p=[.25, .75]. Categorical is a bit unique here because it's one of the few (maybe the only?) univariate distribution with a vector parameter in the core case. By the way, fixing the bug may be related to #49 |
Moving this to Pytensor |
pm.Categorical
are split into observed and missing when observed
has missing values
Hi @ricardoV94 and @jessegrabowski, thanks for documenting this. I encountered this issue while trying to impute missing data for categorical variables. Is there a workaround available in the meantime while the bug is fixed? |
@greenguy33 you can do the imputation yourself. The only thing that happens behind the scenes is that we create two Categorical Variables, one fully observed and one unobserved (where the entries where Simplest case p = [[0.5, 0.5], [0.3, 0.7]]
data = [np.nan, 1]
with pm.Model():
cat_unobserved = pm.Categorical("cat_unobserved", p[0])
cat_observed = pm.Categorical("cat_observed", p[1], observed=data[1])
cat = pm.Deterministic("cat", pm.math.stack((cat_unobserved, cat_observed)) |
@greenguy33 the fix is included in the last release, you should be able to use it in a couple of hours |
@ricardoV94 thanks very much! Will give it a try! |
Uh oh!
There was an error while loading. Please reload this page.
Describe the issue:
Originally raised on discourse here.
It appears that when
pm.Categorical
hasobserved
data with missing values, the variable is re-instantiated insidemodel.make_obs_var
with the wrong number of categories. This example shows that the new number of categories is indeed controlled by the shape of the data:If the data are longer than the number of classes, the code will error out, as shown below:
Reproduceable code example:
Error message:
PyMC version information:
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: