Skip to content

How to set series frequency? #23862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Peque opened this issue Nov 22, 2018 · 3 comments
Closed

How to set series frequency? #23862

Peque opened this issue Nov 22, 2018 · 3 comments
Labels
Datetime Datetime data dtype Usage Question

Comments

@Peque
Copy link
Contributor

Peque commented Nov 22, 2018

Having two time series like:

import pandas
from numpy import nan


# 1-minute period
index = pandas.date_range('1/1/2010', periods=8, freq='T')
series1 = pandas.Series([1., nan, nan, nan, nan, 2., nan, nan], index=index)

# 5-minutes period
index = pandas.date_range('1/1/2010', periods=2, freq='5T')
series5 = pandas.Series([1., 2.], index=index)

One of them is 1-minute based, the other is 5-minutes based. They both have the same values (excluding nan) at the same timestamps.

I do not want to keep nan values, so I drop them. However, dropping them also drops the frequency information from the DatetimeIndex:

series1 = series1.dropna()
print(series1.index.freq)

Now, I would really like to be able to differenciate two series like these: same timestamps, same values, but originating from different base frequencies.

In order to do so, I was able to set the frequency back "by hand":

from pandas.tseries.frequencies import to_offset


series1.index.freq = to_offset('T')

But since 0.23 it seems I can no longer do that:

ValueError: Inferred frequency None from passed values does not conform to passed frequency T

What can I do now to keep that frequency information?

Also, is it incorrect to set a frequency "by hand" for an index that has that frequency but some missing values?

@gfyoung
Copy link
Member

gfyoung commented Nov 22, 2018

cc @jreback @mroeschke

@mroeschke
Copy link
Member

Well freq is not a label but rather describes the spacing between the timeseries. So once series1 drops the nans the timeseries no longer conforms to 'T' frequency:

In [4]: series1
Out[4]:
2010-01-01 00:00:00    1.0
2010-01-01 00:05:00    2.0
dtype: float64

So I'd argue the behavior before was a bug. If you want to preserve original frequency as a label, just use the name attribute on the series.

In [9]: index = pandas.date_range('1/1/2010', periods=8, freq='T')
   ...: series1 = pandas.Series([1., nan, nan, nan, nan, 2., nan, nan], index=index, name='T')
In [11]: series1.dropna()
Out[11]:
2010-01-01 00:00:00    1.0
2010-01-01 00:05:00    2.0
Name: T, dtype: float64

@jschendel
Copy link
Member

So I'd argue the behavior before was a bug.

xref #20678 (specifically item 2)

@jschendel jschendel added this to the No action milestone Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants