-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby.apply datetime bug affecting 0.17 #11324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
canonical way of selection a max column (and way way more efficient)
|
I guess this a bug. You are doing a really really odd thing here though. |
Imagine it in the following context: Objective is to create a new dataframe to see what the users have done. Thus, need to group by the user_identifier and somehow aggregate the events of each user. One of the things you need to find is the first and last timestmap the user interacted with the server. Hope this clarifies things a bit. Pandas is awesome by the way, you guys rule. |
@hadjmic would df.groupby(['user_identifier']).timestamp.agg(['min, 'max']) work for you? You can also control the naming with |
did my example in [59] not clarify? my point is technically using apply like this is ok, but canonically it is quite confusing. |
Perhaps it would have been clearer if I said I have a processUserEvents function. The function takes a dataframe of user events as input (i.e. each group of the groupby operation) and returns back a Series with specific user characteristics. Among those, are the min and max of the timestamp, but there are a lot of other stuff involved, such as values extracted from url paths, query strings, flow paths, etc. |
…das-dev#11324) Addressed PR comments Added comments and updated whatsnew
closed by #11548 |
Exception is raised when
a) the original dataframe has a datetime column
b) the groupby.apply function returns a series object with a new datetime column
Code to reproduce:
This is a new issue affecting 0.17
The text was updated successfully, but these errors were encountered: