-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.groupby(pd.TimeGrouper('W')).groups returns Timestamps instead of Periods #15141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You are starting with a DatetimeIndex (so index of Timestamps), so also after grouping this will give you Timestamps. If you want periods, you can start with Periods (eg using
Note that in this case |
OK thanks! If I do Also just as a "cosmetic" thing: I am only after the list of periods. I can do it with the solution you suggested: |
See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#anchored-offsets, this means 'weekly, but anchored at the sunday' (because your original timestamps that represented the full week were sundays). I have to admit that the documentation can certainly be improved here.
Apparently, when grouping on a period index, then the
Although you probably need to think about why you need those, and if there is not a better method to do the same. |
This is not my question. My question is: "Why is it stated as a dtype for the timestamps?" This was my question to begin with. If you look at my first post you can see the Timestamps are stated as follow: |
@jimbasquiat this is as expected.
see the whatsnew here: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#period-changes. |
also note #12871 ; period resampling is in flux and being worked on. contributions are welcome. |
I am not sure why I have to repeat my question for the 3rd time... I am not talking about a PeriodIndex but a Timestamp object. Do I have to copy-paste here my inital post? |
@jimbasquiat that's correct as well. its simply a frequency attached to a |
@jreback Is there any documentation somewhere about such Timestamp object? To me it makes little sense. The dtype for a Timestamp should be datetime64 or those "<M8..." no? A Timestamp with a frequency, is it not the definition of a Period? |
Not sure if there is much information on the freq attribute for the individual Timestamp objects, but the frequency of the Timestamps is mainly of importance if you have a regular index of them:
So a accessing a single element here just kind of 'inherits' the frequency of the DatetimeIndex. The Timestamp is a scalar and has no dtype attribute. But the dtype of an index or series of those values is certainly 'datetime64[ns]'. |
So a
However, I suspect that this is an implementation detail. IOW, resampling with Periods is still pretty buggy and these are prob just easier to deal with. Further in prior implementation of pandas (well actually currently as well, though a PR is in the works), @jimbasquiat I will create an issue specifically to address this. If you want to do some legwork to see if it is possible to instead return |
Is that something we actually want? -> OK see your other issue, will comment there |
@jorisvandenbossche I created #15146 for that very discussion. my answer is if there are actually no differences then sure. |
@jreback yes, updated my comment above that I saw that and commented over there. |
Thank you guys for picking it up. In my earliest example with the groupby I think returning a Period would have made more sense. In the example you are giving with the Date Range probably not. I am not sure I can be of help on that topic as I am very new to Pandas (started learning it about 3 weeks ago). But thanks! |
Lets consider the following DataFrame:
If I group the datapoints by periods of 1 week and visualize the groups definition, I get:
However I am left with those funny variables such as Timestamp('2010-01-24 00:00:00', freq='W-SUN') that have the prefix Timestamp but are structured like Periods. I think this is not correct, and those variables should be Periods?
The text was updated successfully, but these errors were encountered: