Skip to content

[BUG] Series.diff was always setting the dtype to object #30894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 10, 2020

@MarcoGorelli MarcoGorelli changed the title 🐛 Series.diff was always setting the dtype to object [BUG] Series.diff was always setting the dtype to object Jan 10, 2020
@MarcoGorelli
Copy link
Member Author

@jreback @jbrockmendel @WillAyd thanks for your quick reviews, I've updated the PR accordingly

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the correct fix to to dispatch diff to the EA, rather than trying to fix the type after the fact? Or can we perhaps write a version of diff in terms of take and __sub__?

@jreback
Copy link
Contributor

jreback commented Jan 13, 2020

Isn't the correct fix to to dispatch diff to the EA, rather than trying to fix the type after the fact? Or can we perhaps write a version of diff in terms of take and __sub__?

this is actually a good idea

would be simple to write as a generic EA
method

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Jan 13, 2020

@TomAugspurger @jreback I don't understand.

Currently, we have

        result = algorithms.diff(com.values_from_object(self), periods)

In the case of

pd.Series([1, 2, 3], dtype="Int16").diff()

then

com.values_from_object(self)

is already converting the values to Object before even reaching algorithms.diff:

>>> com.values_from_object(self)
array([1, 2, 3], dtype=object)

So should com.values_from_object be changed?

Anyway, I realise that there's a somewhat tight deadline to get v1.0.0 out, and I'm happy to keep tackling this issue, but if you think it is (currently) beyond my capabilities, I'm happy for someone else to take over :)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 13, 2020

So should com.values_from_object be changed?

We would need to not call that, at least for EAs, but probably for all dtypes, before passing it off. I can probably take a look tomorrow.

We have a week or two for 1.0.0 final.

@MarcoGorelli MarcoGorelli force-pushed the nullable-diff branch 2 times, most recently from bdc1883 to 6ad19ca Compare January 13, 2020 15:55
@MarcoGorelli MarcoGorelli changed the title [BUG] Series.diff was always setting the dtype to object (wip) [BUG] Series.diff was always setting the dtype to object Jan 13, 2020
@MarcoGorelli MarcoGorelli force-pushed the nullable-diff branch 2 times, most recently from c4b5cb7 to ac7bb30 Compare January 13, 2020 16:06
@MarcoGorelli
Copy link
Member Author

@TomAugspurger OK thanks - I might have figured out what you were suggesting anyway, have pushed another commit

@MarcoGorelli MarcoGorelli changed the title (wip) [BUG] Series.diff was always setting the dtype to object [BUG] Series.diff was always setting the dtype to object Jan 13, 2020
@TomAugspurger
Copy link
Contributor

See #31025 for an alternative which allows EAs to define how the operation should be done.

@TomAugspurger
Copy link
Contributor

I think we'll proceed with #31025. Apologies for the duplicated effort @MarcoGorelli.

@TomAugspurger TomAugspurger added this to the No action milestone Jan 19, 2020
@MarcoGorelli MarcoGorelli deleted the nullable-diff branch January 19, 2020 17:11
@MarcoGorelli
Copy link
Member Author

It's OK, it was a good learning experience to attempt this and then read through your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series lose nullable integer dtype after call to .diff()
6 participants