Skip to content

API: when to cast list-like to ndarray, check len match #27911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Aug 14, 2019 · 3 comments
Open

API: when to cast list-like to ndarray, check len match #27911

jbrockmendel opened this issue Aug 14, 2019 · 3 comments
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member

xref #27873

In many arithmetic/comparison ops we do something like

if is_list_like(other) and not hasattr(other, "dtype"):
    other = np.asarray(other)

if is_list_like(other) and len(other) != len(self):
    raise ValueError("Lengths must match")

but we are not entirely consistent about this (see below for summary of what we do where). AFAICT the only case where we wouldn't want to do both of these consistently is if we have an object-dtype, the relevant test case being something like:

ser = pd.Series([["A"], ("A", "B"), frozenset("C")], dtype=np.object_)
ser == ("A", "B")

(note in this example ATM ser == ser[0] will raise ValueError).

We should be consistent in what we are doing for these, but need to decide what that consistent behavior should be.

Summary of current behavior:

  • Series comparison
    • wraps list in ndarray (not listlike)
    • checks length matching for list, ndarray, Index, Series (not tuple, set, ...)
  • Series arithmetic
    • doesn't wrap listlike at all
    • length checks are left implicit
  • Categorical comparisons do the "full" checks
    • What if we have list-like categories?
  • DatetimeArray/TimedeltaArray/PeriodArray arithmetic - no wrapping, no explicit checks
  • DatetimeArray comparisons
    • wrap only list
    • Checks length match for all listlikes, but not at the beginning
  • TimedeltaArray comparisons
    • wrap any listlike (though indirectly)
    • Checks length match for all listlikes, but not at the beginning
  • PeriodArray comparisons
    • dont wrap any listlikes
    • check all list-likes up-front

Not yet reviewed: IntegerArray, PandasArray, SparseArray, Index, FooIndex

@jorisvandenbossche
Copy link
Member

I think the main use case where this comes up is having scalar values that are considered "list like" (according to our is_list_like method).

The obvious case is having lists or tuples in a Series:

In [16]: pd.Series([(1, 2), (3, 4), (4, 5, 6)]) == (3, 4)                                                                                                     
Out[16]: 
0    False
1     True
2    False
dtype: bool

But note that it is more than those built-in containers. When working with custom Python objects, your scalar object needs to be iterable to be seen list-like (so either have __iter__ or have __len__ and __getitem__, or ..).

One example that I work with myself of an "iterable scalar" are Shapely geometry objects. One can always argue about the design, but eg their "MultiPolygon" object (a single geometrical object that is built up from multiple polygons) is iterable.

At the moment, this is only relevant for object dtype though, I think none of the other dtypes share the above concern. And for those a consistent flow of what checks / conversions are done seems like a good idea.

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche thanks for weighing in here. Will is_list_like correctly handle the non-builtin containers you have in mind?

For the object-dtype case, should we continue special-casing list or not-check it like we do for other listlikes?

@jorisvandenbossche
Copy link
Member

Will is_list_like correctly handle the non-builtin containers you have in mind?

Do you mean on the scalar "container-like" objects? (and also not sure what you expect to be "correct")
But: is_list_like will see those objects as list like (return True). This is correct given the definition of list-like we use, but it is not desired for object arrays of such container-like objects.

For the object-dtype case, should we continue special-casing list or not-check it like we do for other listlikes?

It probably makes sense to make the handling of list or non-array-list-likes uniform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

3 participants