Skip to content

SUMM: Consistent Casting and Length Checks in Ops #37086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Oct 12, 2020 · 0 comments
Open

SUMM: Consistent Casting and Length Checks in Ops #37086

jbrockmendel opened this issue Oct 12, 2020 · 0 comments
Labels
API - Consistency Internal Consistency of API/Behavior Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 12, 2020

xref #27911 API: when to cast list-like to ndarray, check len match
xref #13637 API: Index/Series/DataFrame op 1-d list-like coercion

Related bugs:
#21517 Comparison of MultiIndex to tuple interprets tuple as collection

One way to ensure consistent behavior is to enshrine it in ops.unpack_zerodim_and_defer, which just about all of our ops now use. The following is an implementation that matches our current behavior (more specifically: passes current tests) when tacked on to unpack_zerodim_and_defer, with one exception, described below:

        if is_list_like(other) and len(other) != len(self):
            if isinstance(self, ABCDataFrame):
                # DataFrame method does this in align_method_FRAME
                pass
            elif isinstance(self, ABCSeries) and isinstance(other, ABCSeries):
                # We have _align_foo methods that will handle this
                pass                
            elif self.dtype.kind == "O" and not hasattr(other, "dtype"):
                # With e.g. list or tuple and object dtype, it is ambiguous
                #  whether this is intended to be treated as a scalar.
                pass
            else:
                raise ValueError("Lengths must match")

The sticking point is what we do when we have a listlike that is not array-like, generally list or tuple. In this implementation, we would treat it as arraylike whenever we are not object-dtype, and keep it as-is otherwise.

The test that fails with this has a length-1 list:

s_0123 = Series(range(4), dtype="int64")
expected = Series([False] * 4)

result = s_0123 & [False]

ATM in array_ops.logical_op we cast list (only list, not tuple) to ndarray[object]. The implementation above would make this raise ValueError.

Outstanding questions:

  1. Allow length-1 others to broadcast like numpy?
  2. Allow length-1 self to broadcast like numpy?
  3. DataFrame always casts listlike to arraylike; should that be changed?
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2020
@jbrockmendel jbrockmendel added API - Consistency Internal Consistency of API/Behavior Numeric Operations Arithmetic, Comparison, and Logical operations and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

1 participant