Skip to content

ENH: Index should have diff() method #19708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Feb 15, 2018 · 10 comments · Fixed by #52551
Closed

ENH: Index should have diff() method #19708

jbrockmendel opened this issue Feb 15, 2018 · 10 comments · Fixed by #52551
Assignees
Labels
Enhancement good first issue Index Related to the Index class or subclasses

Comments

@jbrockmendel
Copy link
Member

No description provided.

@jbrockmendel
Copy link
Member Author

Also round

@gfyoung gfyoung added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves labels Feb 15, 2018
@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Dec 20, 2021
@jorisvandenbossche jorisvandenbossche changed the title Numeric Indexes should have diff method ENH: Index should have diff method Dec 20, 2021
@jorisvandenbossche jorisvandenbossche changed the title ENH: Index should have diff method ENH: Index should have dif()` method Dec 20, 2021
@jorisvandenbossche jorisvandenbossche changed the title ENH: Index should have dif()` method ENH: Index should have diff() method Dec 20, 2021
@jorisvandenbossche
Copy link
Member

In addition to numeric indices, this would also be useful for datetimelike indices (and avoid a .to_series() call to just calculate the diff)

@vamsi231297
Copy link

Hi @jorisvandenbossche ; can you explain briefly what this issue is about?

@jppcr
Copy link

jppcr commented Mar 3, 2022

I'm thinking of contributing to this. It would be my first contribution to Pandas :)

Is the correct place to implement this, in pandas/core/base.py inside the IndexOpsMixin class? Should this use the diff function of pandas/core/algorithms.py?

Thanks in advance.

@jbrockmendel
Copy link
Member Author

Is the correct place to implement this, in pandas/core/base.py inside the IndexOpsMixin class?

Probably on Index.

Should this use the diff function of pandas/core/algorithms.py?

Either that or to something like return Index(self.to_series().diff(n))

@jppcr
Copy link

jppcr commented Mar 8, 2022

take

@jppcr
Copy link

jppcr commented Apr 9, 2022

I have a WIP version of this here: main...jppcr:issue-19708-indexdiff

I ended up using the simpler code that @jbrockmendel proposed, Index(self.to_series().diff(n)).

May I ask for guidance in two things?

  1. I added a single simple test, that calculates the difference for two integers. Does it make sense to test for more cases (such as for different Index types), since that would just duplicate the original Series.diff tests in a different place?

  2. For some of the Index types I think it does not make sense at all to implement the diff() function (MultiIndex, IntervalIndex), and for others (TimedeltaIndex, PeriodIndex), more work would be needed than just using Series.diff in the background. Would it be OK if I just added a check for the type of self, and raised an TypeError if a certain Index type did not support the diff function?

Below are the Index types I tested the function with:

  • RangeIndex, Int64Index, UInt64Index, Float64Index - works OK
  • CategoricalIndex - works OK if the categories are numerical, but gives out an error if the categories are of mixed types or strings
  • IntervalIndex - does not work
  • MultiIndex - does not work
  • DateTimeIndex - returns Timedeltas, which makes sense
  • TimedeltaIndex - does not work as expected. I think for this to work as expected you would need to be able to pass to the periods argument a value such as "1 day", or something like that
  • PeriodIndex - also does not work as expected

Thanks

@jppcr
Copy link

jppcr commented Apr 9, 2022

Me: Would it be OK if I just added a check for the type of self, and raised an TypeError if a certain Index type did not support the diff function?

Specifically, I meant checking if self is of type Range/Int64/Uint64/Float64 or DateTimeIndex. If it isn't, I would raise an TypeError exception saying that using diff with that Index type isn't supported.

@jbrockmendel
Copy link
Member Author

Me: Would it be OK if I just added a check for the type of self, and raised an TypeError if a certain Index type did not support the diff function?

Specifically, I meant checking if self is of type Range/Int64/Uint64/Float64 or DateTimeIndex. If it isn't, I would raise an TypeError exception saying that using diff with that Index type isn't supported.

Shouldn't be necessary, you should get an appropriate exception from the Series.diff method. Could special-case RangeIndex/MultiIndex to fast-path.

Does it make sense to test for more cases

Yes. You can use the index fixture.

TimedeltaIndex - does not work as expected. I think for this to work as expected you would need to be able to pass to the periods argument a value such as "1 day", or something like that

pd.timedelta_range("1 Day", periods=3).to_series().diff() looks fine to me.

@brunocoutinhoo
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement good first issue Index Related to the Index class or subclasses
Projects
None yet
8 participants