Skip to content

Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alep opened this issue Feb 13, 2015 · 4 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@alep
Copy link

alep commented Feb 13, 2015

# coding: utf-8
import pandas as pd
import numpy as np

frame = pd.read_csv("table.csv", engine="python", parse_dates=['since'])
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d
print "^that is not what I expected"
frame = pd.read_csv("table2.csv", engine="python", parse_dates=['since'])  # add some values to a day
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d

The following data is the contents of table.csv

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

This is table2.csv:

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

The output of print after pivoting table.csv

                                                                id  value
<pandas.tseries.resample.TimeGrouper object at 0x7fc595f96c10>  81   2200
id                                                              81   2200

I would expect something like this:

            value
id             81
since            
2015-01-31   2200
2015-02-01   2200

I can trace the problem to here:

https://github.com/pydata/pandas/blob/62529cca28e9c8652ddf7cca3aa6d41d4e30bc0e/pandas/tools/pivot.py#L114 the index created by groupby already has the object there.

I can't figure anything else. What is the problem, any fixes?

Thanks.

@jreback
Copy link
Contributor

jreback commented Feb 13, 2015

Here's your example easily copy-pastable. FYI, you don't need to specify an engine=... to read_csv.

Prob an edge case, a pull-request would be welcome.

In [24]: data2 = """                                               
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [25]: df2 = pd.read_csv(StringIO(data2),  parse_dates=['since'])

In [26]: df2
Out[26]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-01-31 08:00:00   2200
2  81 2015-01-31 09:00:00   2200
3  81 2015-02-01 07:00:00   2200

In [27]: pd.pivot_table(df2, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[27]: 
           value
id            81
since           
2015-01-31  6600
2015-02-01  2200

In [28]: data = """                                                                 
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [29]: df = pd.read_csv(StringIO(data),  parse_dates=['since'])

In [30]: df
Out[30]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-02-01 07:00:00   2200

In [31]: pd.pivot_table(df, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[31]: 
                                                             id  value
<pandas.tseries.resample.TimeGrouper object at 0x107714450>  81   2200
id                                                           81   2200

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 13, 2015
@jreback jreback added this to the 0.17.0 milestone Feb 13, 2015
@alep
Copy link
Author

alep commented Feb 18, 2015

https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2079 the problem seems to be there. If I set match_axis_length = False, it works fine, So I guess some condition is missing, either here: https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2092 or when setting match_axis_length

@alep
Copy link
Author

alep commented Feb 18, 2015

This is duplicate of #8542

@TomAugspurger
Copy link
Contributor

Closing as a duplicate.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jul 8, 2017
@TomAugspurger TomAugspurger modified the milestones: No action, Next Major Release Jul 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants