Skip to content

DOC warn user about potential information loss in Resampler.interpolate #52198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 7, 2023
Merged

DOC warn user about potential information loss in Resampler.interpolate #52198

merged 10 commits into from
Apr 7, 2023

Conversation

kopytjuk
Copy link
Contributor

In scientific and technical domain people deal with high-frequent or non-equidistant timeseries. Using resample("1s").interpolate() can have unwanted side effects which we should warn in the documentation:

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt

timesteps = [
    dt.datetime(2023, 3, 1, 7, 0, 0),
    dt.datetime(2023, 3, 1, 7, 0, 1),
    dt.datetime(2023, 3, 1, 7, 0, 2),
    dt.datetime(2023, 3, 1, 7, 0, 3),
    dt.datetime(2023, 3, 1, 7, 0, 4)
]
series = pd.Series(data=[1, -1, 2, 1, 3], index=timesteps)

resample_freq = "400ms"
series_resampler = series.resample(resample_freq)
series_resampled_linear = series_resampler.interpolate("linear")

which leads to the following:

result

For that example, information loss is expected.

Imho we should warn the user about this behavior. It seems to be known, to quote @jreback from here:

so the reason this happens is because the index is first reindexed to the new time buckets (upsampled) via reindexing, then interpolation happens.

@mroeschke mroeschke added Docs Resample resample method labels Mar 27, 2023
@mroeschke
Copy link
Member

Thanks, looks like some of our docstring validations are failing

@@ -825,7 +825,6 @@ def fillna(self, method, limit: int | None = None):
"""
return self._upsample(method, limit=limit)

@doc(NDFrame.interpolate, **_shared_docs_kwargs)
Copy link
Contributor Author

@kopytjuk kopytjuk Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke here I removed the docstring concatenation (this is misleading, since the user probably has no NaNs in its data originally) and added links to documentation, so the user can inform her/himself about various methods

@kopytjuk
Copy link
Contributor Author

kopytjuk commented Apr 4, 2023

@mroeschke let me know if something is missing/to be extended

@kopytjuk kopytjuk requested a review from mroeschke April 6, 2023 16:17
@mroeschke mroeschke added this to the 2.1 milestone Apr 7, 2023
@mroeschke mroeschke merged commit 8d3357f into pandas-dev:main Apr 7, 2023
@mroeschke
Copy link
Member

Thanks @kopytjuk

@kopytjuk kopytjuk deleted the improve-resample-interpolate-docs branch November 1, 2023 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants