Skip to content

Converting series of dates to Periods #23438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thequackdaddy opened this issue Nov 1, 2018 · 10 comments · Fixed by #23460
Closed

Converting series of dates to Periods #23438

thequackdaddy opened this issue Nov 1, 2018 · 10 comments · Fixed by #23460
Labels
Datetime Datetime data dtype Period Period data type
Milestone

Comments

@thequackdaddy
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import pandas as pd

dates = pd.Series(pd.date_range('2014-01-01', '2017-01-01', freq='MS'))

# This fails...
pd.PeriodIndex(dates, freq='M')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-687a0b3d3b9a> in <module>
----> 1 pd.PeriodIndex(dates, freq='M')

~/git/pandas/pandas/core/indexes/period.py in __new__(cls, data, ordinal, freq, start, end, periods, tz, dtype, copy, name, **fields)
    221             else:
    222                 # don't pass copy here, since we copy later.
--> 223                 data = period_array(data=data, freq=freq)
    224 
    225         if copy:

~/git/pandas/pandas/core/arrays/period.py in period_array(data, freq, copy)
    920     """
    921     if is_datetime64_dtype(data):
--> 922         return PeriodArray._from_datetime64(data, freq)
    923     if isinstance(data, (ABCPeriodIndex, ABCSeries, PeriodArray)):
    924         return PeriodArray(data, freq)

~/git/pandas/pandas/core/arrays/period.py in _from_datetime64(cls, data, freq, tz)
    234         PeriodArray[freq]
    235         """
--> 236         data, freq = dt64arr_to_periodarr(data, freq, tz)
    237         return cls(data, freq=freq)
    238 

~/git/pandas/pandas/core/arrays/period.py in dt64arr_to_periodarr(data, freq, tz)
    983         elif freq != data.dt.freq:
    984             msg = DIFFERENT_FREQ_INDEX.format(freq.freqstr,
--> 985                                               data.dt.freq.freqstr)
    986             raise IncompatibleFrequency(msg)
    987         data = data._values

AttributeError: 'str' object has no attribute 'freqstr'

# As does this
dates.to_period('M')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-944b687ce2da> in <module>
----> 1 dates.to_period('M')

~/git/pandas/pandas/core/series.py in to_period(self, freq, copy)
   4108             new_values = new_values.copy()
   4109 
-> 4110         new_index = self.index.to_period(freq=freq)
   4111         return self._constructor(new_values,
   4112                                  index=new_index).__finalize__(self)

AttributeError: 'RangeIndex' object has no attribute 'to_period'

Problem description

A pandas series of datetimes can't be easily converted to PeriodIndexes anymore. These work under older versions of pandas, but fail now on GitHub's master.

Expected Output

Just don't fail...

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+850.g62a15fa40
pytest: 3.9.3
pip: 18.1
setuptools: 40.5.0
Cython: 3.0a0
numpy: 1.16.0.dev0+45718fd
scipy: 1.2.0.dev0+016a6ef
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.6
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@gfyoung gfyoung added Datetime Datetime data dtype Period Period data type labels Nov 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

@thequackdaddy : Thanks for reporting this! Can you tell us which version it this last worked?

cc @jreback @mroeschke

@thequackdaddy
Copy link
Contributor Author

It works in 0.23.4

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

cc @jbrockmendel

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

cc @TomAugspurger

#22862 introduced the error messages that are breaking in the code sample, but it's likely not the breaking commit since this is an error path...

@gfyoung gfyoung added this to the 0.24.0 milestone Nov 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

Marking this for 0.24.0 because it seems this "regression" has been introduced in development. I can confirm that the code sample was working in 0.23.4.

@mroeschke
Copy link
Member

Your to_period example doesn't work for me in 0.23.4 (and don't think should have ever worked?)

In [6]: pd.__version__
Out[6]: '0.23.4'

In [7]: dates = pd.Series(pd.date_range('2014-01-01', '2017-01-01', freq='MS'))

In [8]: dates.to_period('M')
AttributeError: 'RangeIndex' object has no attribute 'to_period'

# Did you mean this?
In [11]: dates.dt.to_period('M').head()
Out[11]:
0   2014-01
1   2014-02
2   2014-03
3   2014-04
4   2014-05
dtype: object

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

@mroeschke : As I mentioned above, I can run through the example code from @thequackdaddy perfectly when using 0.23.4.

Also, that error looks really weird because Series definitely has a to_period method (unless you're hiding some other part of the stacktrace).

See below

@jschendel
Copy link
Member

@gfyoung : I get the same error as @mroeschke when trying to use the to_period method without the .dt accessor on both 0.23.4 and master. Using the .dt accessor works fine for me on both 0.23.4 and master.

I think raising an error is the correct behavior based on the docs for Series.to_period; the Series.to_period method is meant to operate on the index whereas as Series.dt.to_period is meant to operate on the values. Perhaps this could use a better error message though (have Series.to_period check if it has the right type of index before new_index = self.index.to_period(freq=freq)).

I can replicate the PeriodIndex code example and agree that it's a regression.

On 0.23.4:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.4'

In [2]: dates = pd.Series(pd.date_range('2014-01-01', '2017-01-01', freq='MS'))

In [3]: dates.to_period('M').head()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-2563b9c17104> in <module>()
----> 1 dates.to_period('M').head()

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in to_period(self, freq, copy)
   3958             new_values = new_values.copy()
   3959 
-> 3960         new_index = self.index.to_period(freq=freq)
   3961         return self._constructor(new_values,
   3962                                  index=new_index).__finalize__(self)

AttributeError: 'RangeIndex' object has no attribute 'to_period'

In [4]: 

In [4]: dates.dt.to_period('M').head()
Out[4]: 
0   2014-01
1   2014-02
2   2014-03
3   2014-04
4   2014-05
dtype: object

On master:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.0.dev0+879.g9019582'

In [2]: dates = pd.Series(pd.date_range('2014-01-01', '2017-01-01', freq='MS'))

In [3]: dates.to_period('M')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-944b687ce2da> in <module>()
----> 1 dates.to_period('M')

~/code/pandas/pandas/core/series.py in to_period(self, freq, copy)
   4108             new_values = new_values.copy()
   4109 
-> 4110         new_index = self.index.to_period(freq=freq)
   4111         return self._constructor(new_values,
   4112                                  index=new_index).__finalize__(self)

AttributeError: 'RangeIndex' object has no attribute 'to_period'

In [4]: dates.dt.to_period('M').head()
Out[4]: 
0    2014-01
1    2014-02
2    2014-03
3    2014-04
4    2014-05
dtype: period[M]

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

@mroeschke @jschendel : Ah, so I see where we're getting confused.

There are two lines of code that are breaking:

  • pd.PeriodIndex(dates, freq='M')
  • dates.to_period('M')

I agree that dates.to_period('M') does not look like correct usage (I apologize, I overlooked this one myself), but what about the first one (neither of you addressed that one)?

@jschendel
Copy link
Member

I think pd.PeriodIndex(dates, freq='M') should work; note that pd.PeriodIndex(list(dates), freq='M') works for me on both 0.23.4 and master. I don't see a reason why it shouldn't work for a Series if it works for a list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants