Subtensor RV lift does not work correctly with Categorical RVs #230

jessegrabowski · 2023-01-02T23:49:02Z

Describe the issue:

Originally raised on discourse here.

It appears that when pm.Categorical has observed data with missing values, the variable is re-instantiated inside model.make_obs_var with the wrong number of categories. This example shows that the new number of categories is indeed controlled by the shape of the data:

import pymc as pm
import numpy as np

# No error
data = np.ma.masked_equal([1, -1, -1], -1)
with pm.Model():
    idx = pm.Categorical(f"hi_idx", p=[0.1, 0.2, 0.7], observed=data)
pm.draw(idx, 100).max()
>>>Out: 2.0

data = np.ma.masked_equal([1, -1], -1)
with pm.Model():
    idx = pm.Categorical(f"hi_idx", p=[0.1, 0.2, 0.7], observed=data)
pm.draw(idx, 100).max()
>>> Out: 1.0

If the data are longer than the number of classes, the code will error out, as shown below:

Reproduceable code example:

import pymc as pm
import numpy as np

data = np.ma.masked_equal([1, 1, 0, 0, 2, -1, -1], -1)
with pm.Model():
    idx = pm.Categorical(f"hi_idx", p=[0.1, 0.2, 0.7], observed=data)
pm.draw(idx, 100).max()

Error message:

IndexError: index 5 is out of bounds for axis 0 with size 3
Apply node that caused the error: AdvancedSubtensor1(TensorConstant{[0.1 0.2 0.7]}, TensorConstant{[5 6]})
Toposort index: 1
Inputs types: [TensorType(float64, (3,)), TensorType(uint8, (2,))]
Inputs shapes: [(3,), (2,)]
Inputs strides: [(8,), (1,)]
Inputs values: [array([0.1, 0.2, 0.7]), array([5, 6], dtype=uint8)]
Outputs clients: [[categorical_rv{0, (1,), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x12AD72820>), TensorConstant{(1,) of 2}, TensorConstant{4}, AdvancedSubtensor1.0)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3194, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3373, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/7b/rzxy96cj0w751_6td3g2yss00000gn/T/ipykernel_96907/3148429013.py", line 3, in <module>
    idx = pm.Categorical(f"hi_idx", p=[0.1, 0.2, 0.7], observed=data)
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/pymc/distributions/distribution.py", line 457, in __new__
    return super().__new__(cls, name, *args, **kwargs)
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/pymc/distributions/distribution.py", line 310, in __new__
    rv_out = model.register_rv(
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/pymc/model.py", line 1348, in register_rv
    rv_var = self.make_obs_var(rv_var, observed, dims, transform)
  File "/Users/jessegrabowski/mambaforge/envs/econ/lib/python3.9/site-packages/pymc/model.py", line 1425, in make_obs_var
    (missing_rv_var,) = local_subtensor_rv_lift.transform(fgraph, fgraph.outputs[0].owner)

HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

PyMC version information:

PyMC version: 5.0.1 Pytensor version: 2.8.11

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

jessegrabowski · 2023-01-03T00:10:28Z

~~I believe this is actually a pytensor bug in pytensor.tensor.random.rewriting.basic.local_subtensor_rv_lift. I guess I'll open a separate issue over there.~~

I maybe spoke too soon. 'local_subtensor_rv_lift' is where the Categorical's p parameters are masked away, but I'm not sure that function is doing anything wrong. The root of the problem seems to be that the parameter p in the Categorical distribution doesn't automatically carry shape information the way that the parameters of other distributions do. It seems like there needs to be a special case added somewhere to handle this, but I'm not positive where it should go.

ricardoV94 · 2023-02-28T22:19:24Z

As you first guessed, the issue is in the local_subtensor_rv_lift which naively tries to split the p parameter as if it had ndim_params=0 (ie. one entry per independent dimension). The splitting should ignore the last dimension, so if you have p=[[.5, .5], [.25, .75]] you split into 2 rvs, one with p=[.5, .5] and another with p=[.25, .75].

Categorical is a bit unique here because it's one of the few (maybe the only?) univariate distribution with a vector parameter in the core case.

By the way, fixing the bug may be related to #49

ricardoV94 · 2023-02-28T22:20:41Z

Moving this to Pytensor

greenguy33 · 2023-04-14T19:52:38Z

Hi @ricardoV94 and @jessegrabowski, thanks for documenting this. I encountered this issue while trying to impute missing data for categorical variables. Is there a workaround available in the meantime while the bug is fixed?

ricardoV94 · 2023-04-21T08:45:22Z

@greenguy33 you can do the imputation yourself. The only thing that happens behind the scenes is that we create two Categorical Variables, one fully observed and one unobserved (where the entries where nan). You just have to split the p vectors wisely among the two and you should get the same result.

Simplest case

p = [[0.5, 0.5], [0.3, 0.7]]
data = [np.nan, 1]
with pm.Model():
  cat_unobserved = pm.Categorical("cat_unobserved", p[0])
  cat_observed = pm.Categorical("cat_observed", p[1], observed=data[1])
  cat = pm.Deterministic("cat", pm.math.stack((cat_unobserved, cat_observed))

ricardoV94 · 2023-04-25T07:53:56Z

@greenguy33 the fix is included in the last release, you should be able to use it in a couple of hours

greenguy33 · 2023-04-25T14:27:42Z

@ricardoV94 thanks very much! Will give it a try!

jessegrabowski added the bug Something isn't working label Jan 2, 2023

jessegrabowski changed the title ~~BUG: Number of classes in pm.Categorical is inferred from the data shape when observed has missing values~~ BUG: Class probabilities in pm.Categorical are split into observed and missing when observed has missing values Jan 3, 2023

ricardoV94 transferred this issue from pymc-devs/pymc Feb 28, 2023

ricardoV94 changed the title ~~BUG: Class probabilities in pm.Categorical are split into observed and missing when observed has missing values~~ Subtensor RV lift does nto work correctly with Categorical RVs Feb 28, 2023

ricardoV94 changed the title ~~Subtensor RV lift does nto work correctly with Categorical RVs~~ Subtensor RV lift does not work correctly with Categorical RVs Feb 28, 2023

ricardoV94 mentioned this issue Apr 21, 2023

Fix local_subtensor_rv_lift rewrite bug with vector parameters #281

Merged

ricardoV94 closed this as completed in #281 Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Subtensor RV lift does not work correctly with Categorical RVs #230

Subtensor RV lift does not work correctly with Categorical RVs #230

jessegrabowski commented Jan 2, 2023 •

edited

Loading

jessegrabowski commented Jan 3, 2023 •

edited

Loading

Uh oh!

ricardoV94 commented Feb 28, 2023 •

edited

Loading

Uh oh!

ricardoV94 commented Feb 28, 2023

Uh oh!

greenguy33 commented Apr 14, 2023

Uh oh!

ricardoV94 commented Apr 21, 2023 •

edited

Loading

Uh oh!

ricardoV94 commented Apr 25, 2023

Uh oh!

greenguy33 commented Apr 25, 2023

Uh oh!

Subtensor RV lift does not work correctly with Categorical RVs #230

Subtensor RV lift does not work correctly with Categorical RVs #230

Comments

jessegrabowski commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue:

Reproduceable code example:

Error message:

PyMC version information:

Context for the issue:

jessegrabowski commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Feb 28, 2023

Uh oh!

greenguy33 commented Apr 14, 2023

Uh oh!

ricardoV94 commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Apr 25, 2023

Uh oh!

greenguy33 commented Apr 25, 2023

Uh oh!

jessegrabowski commented Jan 2, 2023 •

edited

Loading

jessegrabowski commented Jan 3, 2023 •

edited

Loading

ricardoV94 commented Feb 28, 2023 •

edited

Loading

ricardoV94 commented Apr 21, 2023 •

edited

Loading