Skip to content

BUG: Aggregating over an integer on an empty DataFrame removes the index name #7574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Jun 26, 2014 · 2 comments
Milestone

Comments

@chrisaycock
Copy link
Contributor

Suppose I have this simple DataFrame:

In [91]: df = pd.DataFrame({'ref':[1, 2, 3], 'symbol':['GOOGL', 'IBM', 'MSFT'], 'price':[585.93, 180.72, 42.03]}, columns=['ref', 'symbol', 'price'])

In [92]: df
Out[92]:
   ref symbol   price
0    1  GOOGL  585.93
1    2    IBM  180.72
2    3   MSFT   42.03

In [93]: df.dtypes
Out[93]:
ref         int64
symbol     object
price     float64
dtype: object

If I aggregate over an object and then reset the index, everything comes back as expected:

In [95]: df.groupby('symbol').first().reset_index()
Out[95]:
  symbol  ref   price
0  GOOGL    1  585.93
1    IBM    2  180.72
2   MSFT    3   42.03

Similarly with aggregating over an integer:

In [96]: df.groupby('ref').first().reset_index()
Out[96]:
   ref symbol   price
0    1  GOOGL  585.93
1    2    IBM  180.72
2    3   MSFT   42.03

Now let's say I have an empty DataFrame. Aggregating over an object produces what I expect:

In [97]: df.query('price > 1000').groupby('symbol').first().reset_index()
Out[97]:
Empty DataFrame
Columns: [symbol, ref, price]
Index: []

But aggregating over an integer gives me index as the column name instead of the expected ref!

In [98]: df.query('price > 1000').groupby('ref').first().reset_index()
Out[98]:
Empty DataFrame
Columns: [index, symbol, price]
Index: []
          ^^^^^

Interestingly, setting the index in a non-aggregating function does the right thing:

In [100]: df.query('price > 1000').set_index('ref').reset_index()
Out[100]:
Empty DataFrame
Columns: [ref, symbol, price]
Index: []

So aggregating over an integer in an empty DataFrame removes the index name. This was discovered in pandas 0.14.0.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

hmm...could be that the name is lost, ok, marking as a bug

if you would like to dig in and see if you figure out would be great

@jreback
Copy link
Contributor

jreback commented Jul 2, 2014

ahh, yes I fixed that in #7580 (it has an implicit test for the name).

would you like to submit a PR including the tests that you have for both issues?

@jreback jreback closed this as completed Jul 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants