Add squeeze for labeled tensors #1434

AllenDowney · 2025-05-30T15:45:09Z

Adding squeeze and expand_dims

📚 Documentation preview 📚: https://pytensor--1434.org.readthedocs.build/en/1434/

… ExpandDims op and rewrite rule to not add a new dimension when dim is None - Update tests to verify behavior matches xarray

…and streamlining logic.

AllenDowney · 2025-05-30T15:45:50Z

@ricardoV94 Please take a look at squeeze. expand_dims is still WIP

pytensor/xtensor/rewriting/shape.py

tests/xtensor/test_shape.py

ricardoV94 · 2025-05-30T16:17:55Z

pytensor/xtensor/shape.py

+    XTensorVariable
+        A new tensor with the specified dimension removed
+    """
+    return Squeeze(dim=dim)(x)


Better not to have None in the Op. Do the conversion here and pass explicit dims to the Op. The reason for this has to do with PyTensor constraints.

Our Squeeze Op should always know which explicit dims are do be dropped, because the input could change subtly during rewrites, and now we find out a dimension has length 1 after all, which we didn't know before, and reapplying the same Op will change the output type, which is not allowed during rewrites.

Another note, xarray squeeze seems to accept axis argument to do positional squeeze, we should allow that and convert to dims: https://docs.xarray.dev/en/latest/generated/xarray.DataArray.squeeze.html#xarray-dataarray-squeeze

Better to always check the docs of the xarray method we're trying to emulate to be aware of special arguments

You may need to experiment a bit about what does xarray do if you specify both, or specify invalid dims/axis, to try and emulate the behavior on our side as much as is reasonable for us to do.

AllenDowney · 2025-06-02T14:06:15Z

@ricardoV94 I have restored the version with tests that validate against xarray behavior. I think squeeze is ready for review. Again, ignore expand_dims for now.

I have a question about the case where a dimension specifier is symbolic -- is the implementation here correct?

ricardoV94 · 2025-06-02T14:56:59Z

pytensor/xtensor/rewriting/shape.py

+    if not isinstance(node.op, ExpandDims):
+        return False


This check isn't needed, the node_rewriter argument is already used to preselect such nodes

I'll not that for the next iteration, but expand_dims is not ready for review.

ricardoV94 · 2025-06-02T14:57:52Z

pytensor/xtensor/rewriting/shape.py

+    # If dim is None, don't add a new dimension (matching xarray behavior)
+    if dim is None:
+        return [x]


We don't need to support this at the Op level, just make it return self when x.expand_dims(None) is called if we want to even support that

ricardoV94 · 2025-06-02T14:58:31Z

pytensor/xtensor/rewriting/shape.py

+        return [x]
+
+    # Create new dimensions list with the new dimension at the beginning
+    new_dims = [dim, *list(x.type.dims)]


We should support multiple expand_dims, not only one?

ricardoV94 · 2025-06-02T15:00:14Z

pytensor/xtensor/rewriting/shape.py

+
+    x = node.inputs[0]
+    dim = node.op.dim
+    size = getattr(node.op, "size", 1)


size should be a symbolic input (or multiple if we have multiple dims) to the node, so you'll have x, *sizes = node.inputs. This way they can be arbitrary symbolic expressions and not just constants. Check how unstack does it.

ricardoV94 · 2025-06-02T15:00:27Z

pytensor/xtensor/rewriting/shape.py

+    if not isinstance(node.op, Squeeze):
+        return False


ricardoV94 · 2025-06-02T15:00:49Z

pytensor/xtensor/rewriting/shape.py

+    if not isinstance(node.op, Squeeze):
+        return False
+
+    x = node.inputs[0]


Nitpick, I like to do [x] = node.inputs to be explicit that this is a single input node

ricardoV94 · 2025-06-02T15:01:40Z

pytensor/xtensor/rewriting/shape.py

+    dim = node.op.dim
+
+    # Convert single dimension to iterable for consistent handling
+    dims_to_remove = [dim] if isinstance(dim, str) else dim


This sort of normalization should be done at the time the Op/node is defined. The earlier we normalize stuff the easier it is to work downstream.

ricardoV94 · 2025-06-02T15:05:25Z

pytensor/xtensor/rewriting/shape.py

+    else:
+        # Find all dimensions of size 1
+        dim_indices = [i for i, s in enumerate(x.type.shape) if s == 1]
+        if not dim_indices:
+            return False


This shouldn't happen at rewrite time. Decide at the time you create the Op/node what dimensions will be dropped and stick to those. This is a case where PyTensor deviates from numpy/xarray, due to it's non-eager nature and the ability to work with unknown shapes.

You can see this happening in pytensor like this:

import pytensor.tensor as pt import numpy as np x = pt.tensor("x", shape=(None, 2, 1, 2, None)) y = x.squeeze() assert y.eval({x: np.zeros((1, 2, 1, 2, 1))}).shape == (1, 2, 2, 1)

Only the dimension we knew to be length 1 when x.squeeze() was called was dropped. We never try to update which dimension we drop, because y is bound to it's type y.type, that cannot change during rewrites (well shape can go from None -> int), but ndim cannot change.

ricardoV94 · 2025-06-02T15:07:25Z

pytensor/xtensor/rewriting/shape.py

+            return False
+
+    # Create new dimensions list
+    new_dims = [d for i, d in enumerate(x.type.dims) if i not in dim_indices]


Just reuse node.outputs[0].type.dims since you already did the work of figuring out the output dims in make_node

ricardoV94 · 2025-06-02T15:09:35Z

pytensor/xtensor/shape.py

+
+
+def squeeze(x, dim=None):
+    """Remove dimensions of size 1 from an XTensorVariable.


Add a note that this deviates from numpy/xarray. Similar to what we have here:

pytensor/pytensor/tensor/extra_ops.py

Lines 604 to 606 in 4c8c8b6

2. Similarly, if `axis` is ``None``, only dimensions known to be broadcastable will be

removed, even if there are more dimensions that happen to be broadcastable when

the variable is evaluated.

The first point is actually not true anymore, so don't copy it

ricardoV94 · 2025-06-02T15:13:00Z

I have a question about the case where a dimension specifier is symbolic -- is the implementation here correct?

I replied in the comment. It's not correct. Symbolic inputs have to show up in make_node not __init__. You can try to create such a graph like this (adapt it to a test format):

import pytensor
import pytensor.xtensor as px

x = px.xtensor("x", shape=(2,), dims=("a",))
b_size = px.xtensor("b_size", shape=(), dims=())
y = y.expand_dims(b=b_size)

y.eval({x: np.array([0, 1]), b_size: np.array(10)})

If you try this right now you may get an error or a silent bug, because b_size is not part of the symbolic graph of y as far as PyTensor can tell

AllenDowney · 2025-06-02T17:34:53Z

@ricardoV94 squeeze is ready for another look.

expand_dims is still not ready for review

ricardoV94

I did another pass on the Squeeze functionality.

I suggested removing the None case from the Op level and left some other minor comments about it.

I think the tests right now are a bit overkill / redundant / messy. I suggest grouping in different functions the following things:

Tests with explicit squeeze dim (single, multiple, order independent)
Tests with implicit None dim (including the case that at runtime deviates from xarray and as documented)
Tests for errors raised by the Op at creation or runtime

pytensor/xtensor/shape.py

ricardoV94 · 2025-06-03T08:12:23Z

pytensor/xtensor/shape.py

+        to be size 1 at runtime.
+    """
+
+    __props__ = ("dim",)


Nit: call the Op prop dims instead of dim (still use dim in the user facing functions)

Done, but it means that dim and dims are all over the place now. Worth it?

What do you mean they are all over the place now? Why is that?

tests/xtensor/test_shape.py

AllenDowney · 2025-06-04T19:14:09Z

@ricardoV94 squeeze is ready for another look, and expand_dims is ready, too

AllenDowney added 3 commits May 30, 2025 10:29

Add ExpandDims and Squeeze operations with tests and rewrite rules

b3e859c

fix: match xarray's expand_dims behavior with None dimension - Update…

d824870

… ExpandDims op and rewrite rule to not add a new dimension when dim is None - Update tests to verify behavior matches xarray

Simplify local_squeeze_reshape function by removing redundant checks …

a076966

…and streamlining logic.

Ruff

e2ffe1c

ricardoV94 reviewed May 30, 2025

View reviewed changes

pytensor/xtensor/rewriting/shape.py Outdated Show resolved Hide resolved

ricardoV94 reviewed May 30, 2025

View reviewed changes

tests/xtensor/test_shape.py Outdated Show resolved Hide resolved

ricardoV94 reviewed May 30, 2025

View reviewed changes

AllenDowney added 2 commits June 2, 2025 09:49

Restoring the commit with xarray-based tests

7489489

Merge fork/add_expand_dims_squeeze, keeping our changes

22adb6f

AllenDowney changed the title ~~Add expand dims squeeze~~ Add squeeze Jun 2, 2025

ricardoV94 reviewed Jun 2, 2025

View reviewed changes

Updating squeeze

a7e2bf8

Working on expand_dims

332139d

ricardoV94 reviewed Jun 3, 2025

View reviewed changes

AllenDowney added 3 commits June 3, 2025 16:57

Working on squeeze

4b2f0f7

Organizing squeeze tests

2120b1a

Working on expand_dims

2bb1fce

AllenDowney added 2 commits June 4, 2025 15:15

Merge branch 'labeled_tensors' into add_expand_dims_squeeze

dd13fc7

lint

7a308b9

twiecki changed the title ~~Add squeeze~~ Add squeeze for labeled tensors Jun 4, 2025



		def squeeze(x, dim=None):
		"""Remove dimensions of size 1 from an XTensorVariable.

	2. Similarly, if `axis` is ``None``, only dimensions known to be broadcastable will be
	removed, even if there are more dimensions that happen to be broadcastable when
	the variable is evaluated.

Add squeeze for labeled tensors #1434

Are you sure you want to change the base?

Add squeeze for labeled tensors #1434

Conversation

AllenDowney commented May 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AllenDowney commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

ricardoV94 May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AllenDowney commented Jun 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AllenDowney commented Jun 2, 2025

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AllenDowney commented Jun 4, 2025

Uh oh!

Uh oh!

AllenDowney commented May 30, 2025 •

edited by github-actions bot

Loading

ricardoV94 May 30, 2025 •

edited

Loading

ricardoV94 commented Jun 2, 2025 •

edited

Loading