QST:Why DataFrame.pivot() will change the order of the rows of DataFrame? #39143

jiahe224 · 2021-01-13T05:02:55Z

I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

Question about pandas

Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.

# Your code here, if applicable

phofl · 2021-01-13T23:40:08Z

Hi, thanks for your question. Could you provide something reproducible?

jiahe224 · 2021-01-14T07:17:29Z

test = pd.DataFrame({'indus':['12.文教、工美、体育','其中：氯碱','3.黑色金属矿采选业'],'地区':['a','b','c'],'num':[23,21,34],
'year':['2018','2018','2019']})
test_pivot = test.pivot_table(index=['indus','地区'],columns='year',values='num',aggfunc='sum').reset_index()

pivot is like to pivot_table

You will find that the order of the rows of test and test_pivot is not the same. @phofl thanks

attack68 · 2021-01-17T11:18:50Z

with a westernised version of your test I detected no change in rows for the following:

test = pd.DataFrame({'indus': ['d1', 'd2', 'd3'], 'col': ['a', 'b', 'c'], 'num': [23, 21, 34], 'year': ['2018', '2018', '2019']})
test_pivot = test.pivot_table(index=['indus', 'col'], columns='year', values='num', aggfunc='sum').reset_index()
print(test)
print(test_pivot)

  indus col  num  year
0    d1   a   23  2018
1    d2   b   21  2018
2    d3   c   34  2019

year indus col  2018  2019
0       d1   a  23.0   NaN
1       d2   b  21.0   NaN
2       d3   c   NaN  34.0

and a re-order of rows for:

test = pd.DataFrame({'indus': ['d1', 'd4', 'd3'], 'col': ['a', 'b', 'c'], 'num': [23, 21, 34], 'year': ['2018', '2018', '2019']})
test_pivot = test.pivot_table(index=['indus', 'col'], columns='year', values='num', aggfunc='sum').reset_index()
print(test)
print(test_pivot)

  indus col  num  year
0    d1   a   23  2018
1    d4   b   21  2018
2    d3   c   34  2019

year indus col  2018  2019
0       d1   a  23.0   NaN
1       d3   c   NaN  34.0
2       d4   b  21.0   NaN

I will guess that pivot_table does some lexicographical sort on the index?

phofl · 2021-01-17T11:58:59Z

This is quite easy to fix, we are currently not passing anything to a groupby which is called unter the hood, which defaults to sort=True then. But we have a lot of tests depending on this behavior, so since this is covered not sure if we can simply change this

jiahe224 · 2021-01-21T08:06:21Z

I think like groupby(), it would be better to add the sort parameter, the user can decide the boolean value of sort.

jiahe224 · 2021-01-21T08:11:25Z

It was definitely reordered, but I didn't see the sort parameter in pivot(), so I guessed it might rely on groupby(), but after I changed the default value of sort in groupby() to False, it still didn't work. @attack68

attack68 · 2021-01-22T11:49:10Z

The example I gave above can be corrected by going to pandas/core/reshape/pivot.py and changing:

115 grouped = data.groupby(keys, observed=observed)

to

115 grouped = data.groupby(keys, observed=observed, sort=False)

Note that changing the pandas/core/frame.py line 6780 to default False in groupby declaration also works.

phofl · 2021-01-22T12:09:44Z

Yes but this breaks a lot of tests, hence the question what to do exactly

attack68 · 2021-01-22T13:49:21Z

If you add a sort keyword to pivot table in pivot.py#pivot_table and frame.py#pivot_table defaulting to True it doesn't impact the tests since its backwards compatible no?

The the user can specify pivot_table(sort=False) directly? Or did I miss something - perhaps extension of the api is not ideal?

rhshadrach · 2021-02-27T02:13:32Z

cc @jreback - any thoughts on adding an argument sort=True to pivot to control the groupby used?

phofl · 2021-04-06T19:42:30Z

Related to #15105

jiahe224 added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jan 13, 2021

phofl added Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 17, 2021

attack68 mentioned this issue Feb 25, 2021

How to stop sorting in pivot_table and pivot? #40055

Closed

phofl mentioned this issue Apr 14, 2021

Add keyword sort to pivot_table #40954

Merged

4 tasks

jreback added this to the 1.3 milestone Apr 20, 2021

jreback closed this as completed in #40954 Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QST:Why DataFrame.pivot() will change the order of the rows of DataFrame? #39143

QST:Why DataFrame.pivot() will change the order of the rows of DataFrame? #39143

jiahe224 commented Jan 13, 2021 •

edited

Loading

phofl commented Jan 13, 2021

jiahe224 commented Jan 14, 2021 •

edited

Loading

attack68 commented Jan 17, 2021

phofl commented Jan 17, 2021

jiahe224 commented Jan 21, 2021

jiahe224 commented Jan 21, 2021

attack68 commented Jan 22, 2021

phofl commented Jan 22, 2021

attack68 commented Jan 22, 2021

rhshadrach commented Feb 27, 2021

phofl commented Apr 6, 2021

QST:Why DataFrame.pivot() will change the order of the rows of DataFrame? #39143

QST:Why DataFrame.pivot() will change the order of the rows of DataFrame? #39143

Comments

jiahe224 commented Jan 13, 2021 • edited Loading

Question about pandas

phofl commented Jan 13, 2021

jiahe224 commented Jan 14, 2021 • edited Loading

attack68 commented Jan 17, 2021

phofl commented Jan 17, 2021

jiahe224 commented Jan 21, 2021

jiahe224 commented Jan 21, 2021

attack68 commented Jan 22, 2021

phofl commented Jan 22, 2021

attack68 commented Jan 22, 2021

rhshadrach commented Feb 27, 2021

phofl commented Apr 6, 2021

jiahe224 commented Jan 13, 2021 •

edited

Loading

jiahe224 commented Jan 14, 2021 •

edited

Loading