Skip to content

EHN: fix unsigned int type problem of the result diff() #31916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

EHN: fix unsigned int type problem of the result diff() #31916

wants to merge 1 commit into from

Conversation

zzapzzap
Copy link

@zzapzzap zzapzzap commented Feb 12, 2020

An unexpected output type(float64) occurs to diff() when using uint
type. So fix it by restoring original dtype after storing it

Fixes #28909

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

What do you think about this problem with the code? @jreback

An unexpected output type(float64) occurs to diff() when using uint
type. So fix it by restoring original dtype after storing it

Fixes #28909
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zzapzzap , and welcome

fix it by restoring original dtype after storing it

We wouldn't necessarily want to do that, the diff operation may have to change the dtype. The original issue seems to be more about documenting the change in the return type and perhaps throwing a warning, than about making any code changes.

As an aside, code changes typically require tests, see contributing to the code base:

pandas is serious about testing and strongly encourages contributors to embrace test-driven development (TDD). This development process “relies on the repetition of a very short development cycle: first the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test.” So, before actually writing any code, you should write your tests. Often the test can be taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests.

Adding tests is one of the most common requests after code is pushed to pandas. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue.

@TomAugspurger
Copy link
Contributor

For numeric types, Series.diff() introduces NaNs, which are a float. These can't be cast back to int / uint.

I'd recommend using one of the new nullable integer dtypes.

@zzapzzap
Copy link
Author

Thanks for your help.

@zzapzzap zzapzzap closed this Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: diff() when using unsigned ints
3 participants