Skip to content

BUG: apply idxmax on one-column DataFrameGroupby generates ValueError #5788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 28, 2013 · 8 comments · Fixed by #5790
Closed

BUG: apply idxmax on one-column DataFrameGroupby generates ValueError #5788

jorisvandenbossche opened this issue Dec 28, 2013 · 8 comments · Fixed by #5790
Milestone

Comments

@jorisvandenbossche
Copy link
Member

With this dataframe:

import pandas as pd
from StringIO import StringIO

s="""2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649
2011.05.17,02:00,1.40893
2011.05.17,03:00,1.40760
2011.05.17,04:00,1.40750
2011.05.17,05:00,1.40649
2011.05.18,02:00,1.40893
2011.05.18,03:00,1.40760
2011.05.18,04:00,1.40750
2011.05.18,05:00,1.40649"""

df = pd.read_csv(StringIO(s), header=None, names=['date', 'time', 'value'], parse_dates=[['date', 'time']])
df = df.set_index('date_time')

applying an idxmax:

df.groupby(df.index.date).apply(lambda x: x.idxmax())

produces on master:

ValueError: Shape of passed values is (1, 3), indices imply (1, 3)

while this does work on 0.12

It has to do I think with difference between SeriesGroupby and DataFrameGroupby with one column, as df.groupby(df.index.date)['value'].apply(lambda x: x.idxmax()) does work in master, but in 0.12 both ways work.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2013

These seem to work on current master....

In [6]: df.groupby(df.index.date).idxmax()
Out[6]: 
                         value
2011-05-16 2011-05-16 00:00:00
2011-05-17 2011-05-17 02:00:00
2011-05-18 2011-05-18 02:00:00

[3 rows x 1 columns]

In [7]: df.groupby(df.index.date).apply(lambda x: x.idxmax())
Out[7]: 
                         value
2011-05-16 2011-05-16 00:00:00
2011-05-17 2011-05-17 02:00:00
2011-05-18 2011-05-18 02:00:00

[3 rows x 1 columns]

@jreback
Copy link
Contributor

jreback commented Dec 29, 2013

@jorisvandenbossche I put your test about in for #5790, but seems to work ok for me (after I had merged the idxmax on the whitelist...maybe that was the problem

@jreback
Copy link
Contributor

jreback commented Dec 29, 2013

actually this DOES show up, but only on py 2.6 (which also uses an older numpy)

@jreback
Copy link
Contributor

jreback commented Dec 29, 2013

alright....resolved for numpy < 1.7 (in #5790). Very odd error, you apparently can't vstack M8[ns] dtypes and have them stay the same dtype, another oddity of numpy.

@jorisvandenbossche
Copy link
Member Author

Yes, indeed, the problem was numpy 1.6.2! With 1.7.1 it works.
On the computer I was working yesterday, I had the strange combination of the development version of pandas with my system numpy (1.6.2), but an environment with pandas 0.12 with a more recent numpy, which why it seemed that it did work in 0.12 but not anymore in 0.13.

@xin-jin
Copy link

xin-jin commented Jan 19, 2020

Hi @jreback , I encountered some strange behavior (in #31063 ) which is seemingly caused by the unstack here (https://github.com/pandas-dev/pandas/pull/5790/files#diff-720d374f1a709d0075a1f0a02445cd65R2256)

I am not sure whether this behavior is by design?

@jreback
Copy link
Contributor

jreback commented Jan 19, 2020

you are commenting in 6 year old code
if you are having an issue show a reproducible example on master in a new issue

@xin-jin
Copy link

xin-jin commented Jan 19, 2020

@jreback I did have a reproducible example in #31063

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants