Skip to content

groupby().apply(lambda x:x.copy()) raise error. #9946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ruoyu0088 opened this issue Apr 20, 2015 · 5 comments
Closed

groupby().apply(lambda x:x.copy()) raise error. #9946

ruoyu0088 opened this issue Apr 20, 2015 · 5 comments

Comments

@ruoyu0088
Copy link

df = pd.DataFrame({"g":[1, 2, 2, 2], "a":[1, 2, 3, 4], "b":[5, 6, 7, 8]})
df.groupby("g").apply(lambda x:x.copy())

raise

ValueError: Shape of passed values is (3, 4), indices imply (3, 2)

but lambda x:x or lambda x:x[:] works.

@ruoyu0088
Copy link
Author

I debuged this and found the different between df.copy() and df[:]:

import pandas as pd
import numpy as np

def f1(x):
    return x.copy()

def f2(x):
    return x[:]

df = pd.DataFrame({"g":[1, 2, 2, 2], "a":[1, 2, 3, 4], "b":[5, 6, 7, 8]})

print f1(df).index._id is df.index._id  #True
print f2(df).index._id is df.index._id  #False

although df.copy().index is not df.index, the _id attribute is the same.

(df1, df2), _ = pd.lib.apply_frame_axis0(df, f1, [1, 2], np.array([0, 1], np.int64), np.array([1, 4], np.int64))
print df1.index._id is df2.index._id #True

the apply() calls apply_frame_axis0() to do the job, the index of the result DataFrames share the same _id object. This will cause concat() raise the ValueError: Shape of passed values is (3, 4), indices imply (3, 2) error.

@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

you mentioned this in #9867 but we'll take this as a bug-report.

Yes these should do the same.

The entire point of the inference is to guess whether the user is mutating the input or not. IMHO this should be banned, but that ship has sailed.

@jorisvandenbossche
Copy link
Member

So this doesn't raise an error anymore on master, but you still get an inconsistent output:

In [111]: df = pd.DataFrame({"g":[1, 2, 1, 2], "a":[1, 2, 3, 4], "b":[5, 6, 7, 8]})

In [112]: df.groupby("g").apply(lambda x:x)
Out[112]: 
   a  b  g
0  1  5  1
1  2  6  2
2  3  7  1
3  4  8  2

In [113]: df.groupby("g").apply(lambda x:x.copy())
Out[113]: 
     a  b  g
g           
1 0  1  5  1
  2  3  7  1
2 1  2  6  2
  3  4  8  2

@mroeschke
Copy link
Member

close in favor of #14927?

@jorisvandenbossche
Copy link
Member

Yep

@jorisvandenbossche jorisvandenbossche removed this from the Contributions Welcome milestone Jul 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants