Skip to content

Groupby max erroneously returns NaN #6346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Feb 13, 2014 · 5 comments · Fixed by #7393
Closed

Groupby max erroneously returns NaN #6346

cancan101 opened this issue Feb 13, 2014 · 5 comments · Fixed by #7393
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Milestone

Comments

@cancan101
Copy link
Contributor

Using:

df =pd.read_csv(StringIO(""",Date,app,File
2013-04-23,2013-04-23 00:00:00,,log080001.log
2013-05-06,2013-05-06 00:00:00,,log.log
2013-05-07,2013-05-07 00:00:00,OE,xlsx"""), parse_dates=[0])

This does not work:

In [8]: df.groupby("Date")[["File"]].max()
Out[8]:
                     File
Date
2013-04-23 00:00:00   NaN
2013-05-06 00:00:00   NaN
2013-05-07 00:00:00  xlsx

[3 rows x 1 columns]

but this does:

In [9]: df.groupby("Date")["File"].max()
Out[9]:
Date
2013-04-23 00:00:00    log080001.log
2013-05-06 00:00:00          log.log
2013-05-07 00:00:00             xlsx
Name: File, dtype: object
@hayd
Copy link
Contributor

hayd commented Mar 29, 2014

Good news, the subselection bit is a red herring:

In [65]: df.groupby("Date").max()
Out[65]:
                              Unnamed: 0  app  File
Date
2013-04-23 00:00:00                  NaN  NaN   NaN
2013-05-06 00:00:00                  NaN  NaN   NaN
2013-05-07 00:00:00  2013-05-07 00:00:00   OE  xlsx

it's the app columns which is screwing stuff up

In [81]: df[['Date', 'File', 'app']].groupby("Date").max()
Out[81]:
                     File  app
Date
2013-04-23 00:00:00   NaN  NaN
2013-05-06 00:00:00   NaN  NaN
2013-05-07 00:00:00  xlsx   OE

In [83]: df[['Date', 'File', 'Unnamed: 0']].groupby("Date").max()
Out[83]:
                              File Unnamed: 0
Date
2013-04-23 00:00:00  log080001.log 2013-04-23
2013-05-06 00:00:00        log.log 2013-05-06
2013-05-07 00:00:00           xlsx 2013-05-07

@hayd hayd changed the title Groupby Issue Groupby max erroneously returns NaN Mar 29, 2014
@jreback jreback added the Dtypes label Apr 9, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@dsm054
Copy link
Contributor

dsm054 commented Jun 8, 2014

This seems to be fixed now as a result of the groupby mods; close?

@hayd hayd modified the milestones: 0.14.1, 0.15.0 Jun 8, 2014
@hayd
Copy link
Contributor

hayd commented Jun 8, 2014

Yes this is fixed in 0.14! Let's add a regression test and then close.

@cpcloud
Copy link
Member

cpcloud commented Jun 8, 2014

i'll add the test

@hayd
Copy link
Contributor

hayd commented Jun 8, 2014

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants