WIP: EA SetArray #22382

h-vetinari · 2018-08-16T06:31:29Z

This is very WIP, but might IMO be helpful in figuring out the EA interfaces, since it has some other requirements than the EAs so far. It makes some way towards #4480 and might be a basis for #21547, where the comments were effectively "make this an EA", e.g. @jreback:

EA is an extension array & a dtype, both of which are first class. That's how I'd like to see sets (and lists and dicts)

The code works so far, and tests pass, but there's a large amount of issues (and probably some conflicts with some other current EA-PRs). As a disclaimer, in each case it may well be that I'm misunderstanding something, but I've tried my best, and the same issues would probably be encountered by other EA authors.

several methods, including fillna and astype do not dispatch to the EA impl
same for df.fillna not dispatching to Series
set is a case, where the nan-value is incompatible with the dtype in normal operations (i.e. {1} | np.nan raises). This means that tests like TestMethods.test_combine_add and TestMethods.test_combine_le will always fail on missing data
Bool ops had no dispatching mechanism. I added one, but can probably be improved. These are also not tested as far as I can tell.
Bool ops in particular will have to figure out a contract which side is cast to what, e.g. if op(EA, np.ndarray) is the same as op(np.ndarray, EA), or if the latter downcasts EA.
data[-1 must be non-na for BaseGetitemTests.test_take
found tests/extension/conftest.py too late, will need to adapt my fixtures
I'm sure my changes to pandas/tests/extension/base/ops.py will cause friction, but those are just WIP as well. In any case, tests like BaseComparisonOpsTests.test_compare_array can never suceed with comparing 0 to a non-numeric dtype (and I wrapped it into a Series since I was getting errors with the dispatch as a list object)

If/when this is ever merged, I'd like add more options to .astype('set') (example in #4480), and add a set accessor for other methods (again #4480).

pep8speaks · 2018-08-16T06:31:34Z

Hello @h-vetinari! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 16, 2018 at 06:32 Hours UTC

jorisvandenbossche · 2018-08-16T08:05:08Z

@h-vetinari Thanks a lot for working on this!

To repeat my comment of the other PR, I personally don't think this should be necessarily included in pandas core, but would be perfect as an external package (now we have the EA interface).
That said, actually doing this (here or as external package) is indeed very good to iron out the interface, as arrays that contain sequences as scalars will certainly have some additional needs/complexities.

We have a dummy JSONArray implementation in the test suite that might run into similar problems, but for now that was not given much attention.

several methods, including fillna and astype do not dispatch to the EA impl

The astype is a know issue (#22343, as you are aware of), fillna should normally dispatch. Do you have a reproducible example to show it does not? If there are any others that don't dispatch, you can open separate issues/PRs for that.

Bool ops in particular will have to figure out a contract which side is cast to what, e.g. if op(EA, np.ndarray) is the same as op(np.ndarray, EA), or if the latter downcasts EA.

Can you give a more concrete example to show what you mean?

there's not a lot of documentation what the EA test suite expects. E.g. data must be length 100, data_missing must be [np.nan, instance of dtype] (but data can have missing values too?), data[-1] must non-na, etc.

The things you mention above (except the last one) are mentioned in the docstrings in pandas/tests/extensions/conftest.py ? But please feel free to add any better documentation on things that are unclear!
I think we assume that data has no missing values (and maybe also that all are unique, not sure).

jorisvandenbossche · 2018-08-16T08:09:30Z

pandas/core/ops.py

@@ -1482,6 +1484,45 @@ def _bool_method_SERIES(cls, op, special):
    code duplication.
    """

+    def dispatch_to_extension_op(op, left, right):


is this a repetition of the existing dispatch function?

h-vetinari · 2018-08-16T09:04:34Z

The things you mention above (except the last one) are mentioned in the docstrings in pandas/tests/extensions/conftest.py ? But please feel free to add any better documentation on things that are unclear!
I think we assume that data has no missing values (and maybe also that all are unique, not sure).

I had edited the OP a few minutes before you responded, because I had finally found pandas/tests/extensions/conftest.py. Not all requirements are mentioned though, and test_integer.py does have missing values in data, for ex.

I'll respond to the rest later, no time right now.

jreback · 2018-11-23T03:21:47Z

closing as stale. would be more amenable as an EA.

h-vetinari added 5 commits August 13, 2018 21:15

initial commit

d1c4c24

first working commit

d0abd36

First pass at tests

74e5ffa

Tests

e2f85b4

Fixes

a957f3e

flake8

31688c4

h-vetinari changed the title ~~WIP: EA Setarray~~ WIP: EA SetArray Aug 16, 2018

jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Aug 16, 2018

jorisvandenbossche reviewed Aug 16, 2018

View reviewed changes

h-vetinari mentioned this pull request Nov 20, 2018

ExtensionDtype should be hashable #22476

Closed

jreback closed this Nov 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: EA SetArray #22382

WIP: EA SetArray #22382

h-vetinari commented Aug 16, 2018 •

edited

Loading

pep8speaks commented Aug 16, 2018 •

edited

Loading

jorisvandenbossche commented Aug 16, 2018

jorisvandenbossche Aug 16, 2018

h-vetinari commented Aug 16, 2018 •

edited

Loading

jreback commented Nov 23, 2018

WIP: EA SetArray #22382

WIP: EA SetArray #22382

Conversation

h-vetinari commented Aug 16, 2018 • edited Loading

pep8speaks commented Aug 16, 2018 • edited Loading

Comment last updated on August 16, 2018 at 06:32 Hours UTC

jorisvandenbossche commented Aug 16, 2018

jorisvandenbossche Aug 16, 2018

Choose a reason for hiding this comment

h-vetinari commented Aug 16, 2018 • edited Loading

jreback commented Nov 23, 2018

h-vetinari commented Aug 16, 2018 •

edited

Loading

pep8speaks commented Aug 16, 2018 •

edited

Loading

h-vetinari commented Aug 16, 2018 •

edited

Loading