Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

alep · 2015-02-13T21:03:29Z

# coding: utf-8
import pandas as pd
import numpy as np

frame = pd.read_csv("table.csv", engine="python", parse_dates=['since'])
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d
print "^that is not what I expected"
frame = pd.read_csv("table2.csv", engine="python", parse_dates=['since'])  # add some values to a day
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d

The following data is the contents of table.csv

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

This is table2.csv:

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

The output of print after pivoting table.csv

                                                                id  value
<pandas.tseries.resample.TimeGrouper object at 0x7fc595f96c10>  81   2200
id                                                              81   2200

I would expect something like this:

            value
id             81
since            
2015-01-31   2200
2015-02-01   2200

I can trace the problem to here:

https://github.com/pydata/pandas/blob/62529cca28e9c8652ddf7cca3aa6d41d4e30bc0e/pandas/tools/pivot.py#L114 the index created by groupby already has the object there.

I can't figure anything else. What is the problem, any fixes?

Thanks.

The text was updated successfully, but these errors were encountered:

jreback · 2015-02-13T21:15:01Z

Here's your example easily copy-pastable. FYI, you don't need to specify an engine=... to read_csv.

Prob an edge case, a pull-request would be welcome.

In [24]: data2 = """                                               
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [25]: df2 = pd.read_csv(StringIO(data2),  parse_dates=['since'])

In [26]: df2
Out[26]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-01-31 08:00:00   2200
2  81 2015-01-31 09:00:00   2200
3  81 2015-02-01 07:00:00   2200

In [27]: pd.pivot_table(df2, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[27]: 
           value
id            81
since           
2015-01-31  6600
2015-02-01  2200

In [28]: data = """                                                                 
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [29]: df = pd.read_csv(StringIO(data),  parse_dates=['since'])

In [30]: df
Out[30]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-02-01 07:00:00   2200

In [31]: pd.pivot_table(df, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[31]: 
                                                             id  value
<pandas.tseries.resample.TimeGrouper object at 0x107714450>  81   2200
id                                                           81   2200

alep · 2015-02-18T16:08:42Z

https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2079 the problem seems to be there. If I set match_axis_length = False, it works fine, So I guess some condition is missing, either here: https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2092 or when setting match_axis_length

alep · 2015-02-18T17:41:24Z

This is duplicate of #8542

TomAugspurger · 2017-07-08T22:18:26Z

Closing as a duplicate.

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 13, 2015

jreback added this to the 0.17.0 milestone Feb 13, 2015

TomAugspurger closed this as completed Jul 8, 2017

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jul 8, 2017

TomAugspurger modified the milestones: No action, Next Major Release Jul 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

alep commented Feb 13, 2015

jreback commented Feb 13, 2015

alep commented Feb 18, 2015

alep commented Feb 18, 2015

TomAugspurger commented Jul 8, 2017

Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index. #9485

Comments

alep commented Feb 13, 2015

jreback commented Feb 13, 2015

alep commented Feb 18, 2015

alep commented Feb 18, 2015

TomAugspurger commented Jul 8, 2017