Can't use dt.ceil, dt.floor, dt.round with coarser target frequencies #15303

rosnfeld · 2017-02-03T23:52:51Z

Code Sample

This works fine:

In [16]: hours = pd.Series(pd.date_range('2017-01-01', '2017-01-31', freq='H'))

In [17]: hours.dt.ceil('D')
Out[17]: 
0     2017-01-01
1     2017-01-02
2     2017-01-02
3     2017-01-02
4     2017-01-02
...

But this doesn't:

In [18]: days = pd.Series(pd.date_range('2017-01-01', '2017-01-31', freq='D'))

In [19]: days.dt.ceil('M')
...
/home/andrew/git/pandas-rosnfeld-py3/pandas/tseries/offsets.py in nanos(self)
    510     @property
    511     def nanos(self):
--> 512         raise ValueError("{0} is a non-fixed frequency".format(self))
    513 
    514 

ValueError: <MonthEnd> is a non-fixed frequency

Problem description

I haven't done an exhaustive test, but it seems that one can't use coarser frequencies like 'M', 'Q', 'A' in the dt.ceil/floor/round set of methods.

Expected Output

It would seem reasonable to be able to take a series of scattered dates and be able to "coarsen" them into month-ends, quarter-ends, year-ends, etc.

Output of `pd.show_versions()`

(admittedly some pretty old stuff in this environment but I also got the same behavior on 0.19.2 with latest of everything else) INSTALLED VERSIONS ------------------ commit: 74e20a0 python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-62-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.19.0+300.g74e20a0.dirty
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.2
lxml: 3.5.0
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: 0.6.7.None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-02-04T00:43:54Z

Here is an alternative and probably more correct.

In [3]: s.dt.to_period('M').dt.to_timestamp()
Out[3]: 
DatetimeIndex(['2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01',
               ...
               '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01', '2017-01-01'],
              dtype='datetime64[ns]', length=721, freq=None)

The issue is rounding is pretty tricky and I would argue unexpected for Months. You can easily have round up on the 15th for some months and down for others. You could argue that .ceil/.floor are unambiguous though.

If you want to implement something reasonable would be fine. (you could prob just do the .to_period transform under the hood).

rosnfeld · 2017-07-22T20:40:49Z

Taking a look at this: if the date is '2017-01-23 12:34:56', what should floor and ceil return for a frequency like 'M' (MonthEnd)? If you're doing .dt.to_period('M').dt.to_timestamp(), then you get '2017-01-01' - but that's not a MonthEnd value. Similarly if you took the end_time of that period, you'd get '2017-01-31 23:59:59.999999999', also not a MonthEnd. I think you actually want '2016-12-31 00:00:00' (for floor) and '2017-01-31 00:00:00' (for ceil) - right? So I am not sure that to_period is the key here.

While I wrote this issue for Series, upon playing with this further I've noticed that it also affects DatetimeIndex and individual Timestamp values. I have a fix for individual values (e.g. pd.to_datetime('2017-01-23 12:34:56').ceil('M') now behaves as I think is right in the paragraph above), but I'm actually not sure how to fix the Series/DatetimeIndex case.

My patch currently affects the Timestamp._round() code in tslib.pyx, and goes something like the following, using the offset rollforward/rollback to stay on-offset for at least the floor and ceil cases:

        from pandas.tseries.offsets import Tick
        offset = to_offset(freq)

        if not isinstance(offset, Tick):
            offset.normalize = True

            if rounder == np.ceil:
                return offset.rollforward(self)

            if rounder == np.floor:
                return offset.rollback(self)

            raise ValueError('Can only round to Tick offsets')

There is a very similar TimelikeOps._round() method in datetimelike.py, but I can't do the same change there, as the offset rollforward/rollback methods only work on Timestamp. Suggestions are welcome - not sure how one normally "scales up" more complicated individual operations like this to Series in a performant way. Also not sure if there is a cleaner/more obvious way to do what I've done in the single Timestamp case.

jreback · 2017-09-06T10:30:55Z

@rosnfeld

This is how you work with dti/Series; its very similar.

In [10]: i = pd.to_datetime(['2017-01-23 12:34:56'])

In [11]: i
Out[11]: DatetimeIndex(['2017-01-23 12:34:56'], dtype='datetime64[ns]', freq=None)

In [12]: i.normalize() + pd.offsets.MonthBegin()
Out[12]: DatetimeIndex(['2017-02-01'], dtype='datetime64[ns]', freq=None)

In [13]: i.normalize() + pd.offsets.MonthEnd()
Out[13]: DatetimeIndex(['2017-01-31'], dtype='datetime64[ns]', freq=None)

rosnfeld · 2017-11-19T20:58:22Z

(got motivated to come back to this issue as I hit it again...)

Thanks @jreback - I am very close now. Only issue is that adding the offset isn't idempotent/doesn't handle the exact boundary properly. That is, in the single Timestamp case, timestamp.ceil('M').ceil('M') the second application correctly does nothing, but in the series case series.dt.ceil('M').dt.ceil('M') rolls the month again, since it adds another offset.

offset.rollforward(timestamp) is smart enough to call offset.onOffset(timestamp) to catch this case. I guess I need a series equivalent that somehow incorporates this kind of logic... but may be tricky to do in an efficient way.

batterseapower · 2019-06-25T10:03:02Z

If you just want to roll forward then i.normalize() + 0*pd.offsets.MonthBegin() is an idempotent operation.

Safrone · 2021-01-07T20:45:46Z

I'm interested in this still, I can't find a reasonable way to snap a datetime to a week beginning

the best I can do is:

import pandas as pd
end_time = pd.Timestamp('12/15/2020 9:24:00', tz='US/Eastern')
intervals = ['W-SUN', 'H', 'T']
if interval.name == 'W-SUN':
    query_end_time = pd.DatetimeIndex([end_time.normalize()]).snap(pd.offsets.Week(weekday=6))[0]
else:
    query_end_time = end_time.floor(interval.code)

.floor used to work for 'W-SUN'

jpacostar · 2021-05-25T08:42:45Z

This worked for me :)

hours.dt.to_period('W-SUN').dt.start_time

xuancong84 · 2021-08-19T16:03:47Z

It is so surprising that this ticket has been opened in 2017 and has not been fixed in 4 years. It is very obvious that this is due to incompetent programming nature when it comes to grouping dates into different buckets. As I mentioned in #43093, the internal logic inconsistency of Pandas will only make these kinds of defects harder to rectify.

If grouping by years or months cannot be done easily, I have no objection. This is because one year has ~365.2425 days, as a result, the number of days in a year and thus in a month cannot be determined in a straight-forward manner, you need to determine where are the leap years and leap months (especially every 4 years, except for years that are divisable by 100 and not divisable by 400 we got a leap year). However, the number of days in a week is always 7, this is not affected by leap year. Thus, this is something that can be easily calculated just using a simple MOD operation on the raw timestamp. Not sure why pd.Timestamp.floor('W') or W-SUN/W-MON/etc. still does not work until today.

jreback · 2021-08-19T16:17:04Z

@xuancong84

statements like this

It is very obvious that this is due to incompetent programming nature when it comes to grouping dates into different buckets

will get you banned, pls follow the code-of-conduct.

constructive criticisim and patches are welcome.

MarcoGorelli · 2021-08-19T17:31:56Z

@xuancong84 it's not the first time you make such comments:

So do we have another general inconsistency due to incompetent programming?

#32263 (comment)

Thus, incompetent programming and inherently deficient/defective architectural design does not allow this bug to be solved that easily ^_^

#33202

-1 on @jreback , you seem to view everything as negative and simply refuse to implement anything useful to the general public, your attitude is very negative and arrogant. Okay, so be it.

#33738

So, please take the above as a final warning - thanks

xuancong84 · 2021-08-20T01:11:54Z

It is so surprising that this ticket has been opened in 2017 and has not been fixed in 4 years. It is very obvious that this is due to incompetent programming nature when it comes to grouping dates into different buckets. As I mentioned in #43093, the internal logic inconsistency of Pandas will only make these kinds of defects harder to rectify.

I do apologize that maybe my comments are a bit harsh for you guys. But why do I only make harsh kind of comments only in your repo, but not everyone else's repo? I hope you can focus more of your energy in solving those bugs rather than solving the people who complain about your bug.

I will make no further harsh comments on Pandas repo in the future but on this issue, may I know which part of the source code needs to be changed in order to solve the bug?

jreback added Difficulty Intermediate Frequency DateOffsets Datetime Datetime data dtype labels Feb 4, 2017

jreback added this to the Next Major Release milestone Feb 4, 2017

jreback added the Enhancement label Feb 4, 2017

ghost mentioned this issue Jun 28, 2019

ENH: allow non-Tick offsets in index.round/ceil/floor #27090

Closed

4 tasks

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

martinvonk mentioned this issue Feb 17, 2023

Improve modeling with non-daily timesteps pastas/pastas#556

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use dt.ceil, dt.floor, dt.round with coarser target frequencies #15303

Can't use dt.ceil, dt.floor, dt.round with coarser target frequencies #15303

rosnfeld commented Feb 3, 2017

jreback commented Feb 4, 2017

rosnfeld commented Jul 22, 2017

jreback commented Sep 6, 2017

rosnfeld commented Nov 19, 2017

batterseapower commented Jun 25, 2019

Safrone commented Jan 7, 2021 •

edited

Loading

jpacostar commented May 25, 2021

xuancong84 commented Aug 19, 2021

jreback commented Aug 19, 2021 •

edited by MarcoGorelli

Loading

MarcoGorelli commented Aug 19, 2021 •

edited

Loading

xuancong84 commented Aug 20, 2021 •

edited

Loading

Can't use dt.ceil, dt.floor, dt.round with coarser target frequencies #15303

Can't use dt.ceil, dt.floor, dt.round with coarser target frequencies #15303

Comments

rosnfeld commented Feb 3, 2017

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Feb 4, 2017

rosnfeld commented Jul 22, 2017

jreback commented Sep 6, 2017

rosnfeld commented Nov 19, 2017

batterseapower commented Jun 25, 2019

Safrone commented Jan 7, 2021 • edited Loading

jpacostar commented May 25, 2021

xuancong84 commented Aug 19, 2021

jreback commented Aug 19, 2021 • edited by MarcoGorelli Loading

MarcoGorelli commented Aug 19, 2021 • edited Loading

xuancong84 commented Aug 20, 2021 • edited Loading

Output of `pd.show_versions()`

Safrone commented Jan 7, 2021 •

edited

Loading

jreback commented Aug 19, 2021 •

edited by MarcoGorelli

Loading

MarcoGorelli commented Aug 19, 2021 •

edited

Loading

xuancong84 commented Aug 20, 2021 •

edited

Loading