Skip to content

'base' argument when resampling has no effect #22855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arisliang opened this issue Sep 27, 2018 · 4 comments
Open

'base' argument when resampling has no effect #22855

arisliang opened this issue Sep 27, 2018 · 4 comments
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Resample resample method

Comments

@arisliang
Copy link

arisliang commented Sep 27, 2018

I seem to encounter the similar issue with #10530 which is marked closed.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(3,4)))
df.index = [pd.Timestamp('2018-02-07'), pd.Timestamp('2018-06-22'), pd.Timestamp('2018-09-17')]
df.resample('6M',base=6).min()

the base parameter can be anything that doesn't affect the result or generate error (version 0.23.4)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 27, 2018

base only applies to frequencies smaller than a day.

base : int, default 0
    For frequencies that evenly subdivide 1 day, the "origin" of the
    aggregated intervals. For example, for '5min' frequency, base could
    range from 0 through 4. Defaults to 0

What's your expected output?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Sep 27, 2018
@arisliang
Copy link
Author

arisliang commented Sep 27, 2018

Oh, I missed that. I expected to get '2018-06-30' and '2018-12-31' for base 6. Would be nicer if it generates some error message says base can only apply to smaller than a day.

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Resample resample method and removed Needs Info Clarification about behavior needed to assess issue labels Oct 20, 2019
@yohplala
Copy link

yohplala commented May 8, 2020

Hi Tom, hi Matthew.
I am sorry, it seems to me base is indeed not working, at least in this example.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(2,4)))
df.index = pd.period_range(start='2020-01-01 01:00', periods=2, freq='1h')
resample = df.resample('4h',base=0).sum()
print(resample)
                         0         1        2         3
2020-01-01 01:00 -1.376917  1.294601  2.01079 -1.220457

As I understand, base=0 should anchor the '4h' period to midnight, but as visible on result, it starts at 01:00.

This is related to this ticket I opened some time ago (I did not see base was supposed to do that, and I wanted to specify an anchoring with the same type of convention than the one used to anchor a week on a given day).
#33129

Am I missing something?

@yohplala
Copy link

yohplala commented May 8, 2020

Here is currently the code I am using to bypass the "notworking-ness" of base parameter.

    first_index = df.index[0]
    with_midnight = False
    if first_index.hour != 0:
        try:
            # If PeriodIndex
            first_index = first_index.start_time
            midnight = pd.DataFrame(index = [pd.Period(first_index.normalize(),
                                                       freq = df.index.freq)],
                                    columns = df.columns)
            # `first_index` is modified to be re-used to remove rows with NaN
            # values on the new DataFrame.
            first_index = pd.Period(first_index, to_per)
        except:
            # Else DateTimeIndex
            midnight = pd.DataFrame(index = [first_index.normalize()],
                                    columns = df.columns)
                
        # Index is set to df.index.name, as its name is lost when creating
        # midnight and concatenating with it.
        df = pd.concat([midnight, df]).rename_axis(index = df.index.name)
        with_midnight = True

    resampled = df.resample(to_period').XXX    # <- replace with function to be used when resampling

    if with_midnight:
        # Remove 1st added rows containing NaN value
        mask = resampled.isnull().any(axis = 1) & \
                          (resampled.index < first_index)
        resampled = resampled.loc[~mask]

A bit lengthy maybe :)
The trouble is that I am re-using it each time I have a new function to use when resampling: sum, cumsum, min, and so on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Resample resample method
Projects
None yet
Development

No branches or pull requests

4 participants