Skip to content

DOC: asfreq depends on whether the index is ascending or descending #54555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
dilwong opened this issue Aug 15, 2023 · 2 comments · Fixed by #59589
Closed
1 task done

DOC: asfreq depends on whether the index is ascending or descending #54555

dilwong opened this issue Aug 15, 2023 · 2 comments · Fixed by #59589
Assignees
Labels
Datetime Datetime data dtype Docs Index Related to the Index class or subclasses

Comments

@dilwong
Copy link

dilwong commented Aug 15, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.Series.asfreq.html

Documentation problem

It seems that method='ffill' and method=bfill for pandas.Series.asfreq depends on the order of the original Series (and the same is true for DataFrame):

import pandas as pd

index = pd.date_range('1/1/2000', periods=4, freq='T')
series = pd.Series([0.0, 1.0, 2.0, 3.0], index=index)

print(series.sort_index(ascending=True).asfreq(freq='30S', method='ffill'))
print('\n')
print(series.sort_index(ascending=False).asfreq(freq='30S', method='ffill'))

Is this the intended behavior? If so, it should probably be documented because one might assume method=ffill respects time ordering in the sense that past values fill in for future values.

Moreover, the documentation here

Otherwise, the new index will be equivalent to pd.date_range(start, end, freq=freq) where start and end are, respectively, the first and last entries in the original index

is wrong then because this piece of code and the example above indicates that asfreq doesn't respect the start and end of the original index (in the sense of series.index[0] and series.index[-1]), but rather uses the min and max dates of the index.

At the very least, however, asfreq fails if the index is not monotonically increasing or decreasing.

Suggested fix for documentation

Specify what happens if the index goes in reverse time order.

@dilwong dilwong added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2023
@mroeschke mroeschke added Datetime Datetime data dtype Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
@Pranav-Wadhwa
Copy link
Contributor

Take

@Pranav-Wadhwa
Copy link
Contributor

If so, it should probably be documented because one might assume method=ffill respects time ordering in the sense that past values fill in for future values.

I propose that we update the documentation of the method parameter to say: 'pad' / 'ffill': propagate last valid observation forward to next valid *based on the order of the index*. Because the doc already says last valid observation, it is implied that it uses the ordering of the index not time ordering, but adding it explicitly could add clarity.


Moreover, the documentation here Otherwise, the new index will be equivalent to pd.date_range(start, end, freq=freq) where start and end are, respectively, the first and last entries in the original index is wrong then because this piece of code and the example above indicates that asfreq doesn't respect the start and end of the original index (in the sense of series.index[0] and series.index[-1]), but rather uses the min and max dates of the index.

It looks like for both the examples you provided (series with ascending index and series with descending index), both of the resulting indices are the same:

>>> series.sort_index(ascending=False).asfreq(freq='30S', method='ffill').index
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:30',
               '2000-01-01 00:01:00', '2000-01-01 00:01:30',
               '2000-01-01 00:02:00', '2000-01-01 00:02:30',
               '2000-01-01 00:03:00'],
              dtype='datetime64[ns]', freq='30S')
>>> series.sort_index(ascending=True).asfreq(freq='30S', method='ffill').index
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:30',
               '2000-01-01 00:01:00', '2000-01-01 00:01:30',
               '2000-01-01 00:02:00', '2000-01-01 00:02:30',
               '2000-01-01 00:03:00'],
              dtype='datetime64[ns]', freq='30S')]

The documentation would be better written as where 'start' and 'end' are, respectively, the min and max entries in the original index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants