Skip to content

pandas.core.groupby.DataFrameGroupBy to_csv method doesn't ouput csv file as expected #4882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
c0indev3l opened this issue Sep 19, 2013 · 13 comments
Milestone

Comments

@c0indev3l
Copy link

>>> df1 = pd.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

>>> g1 = df1.groupby( [ "Name" ] )

>>> print g1.head()
               City     Name
Name                        
Alice   0   Seattle    Alice
Bob     1   Seattle      Bob
        4   Seattle      Bob
Mallory 2  Portland  Mallory
        3   Seattle  Mallory
        5  Portland  Mallory

>>> g1.to_csv('out.csv')
g1.to_csv('out.csv')
Out[10]: 
Name
Alice      None
Bob        None
Mallory    None
dtype: object

(Why some data are output to ipython console ?)

>>> !cat out.csv
,City,Name
2,Portland,Mallory
3,Seattle,Mallory
5,Portland,Mallory
@cpcloud
Copy link
Member

cpcloud commented Sep 19, 2013

What is your end goal here? You shouldn't really be using to_csv on a groupby. If you really want to write each group separately or you need to do some processing on each group before writing, consider looping over the groups:

for group_name, df in df.groupby('Name'):
    newdf = process(df)
    with open('the_csv.csv', 'a') as f:
        df.to_csv(f)

@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

If you REALLY want the output you have, you can do this, but as @cpcloud , I don't see utility in this

In [46]: df1.reset_index().set_index(['Name','City']).sortlevel(0)
Out[46]: 
                  index
Name    City           
Alice   Seattle       0
Bob     Seattle       1
        Seattle       4
Mallory Portland      2
        Portland      5
        Seattle       3

In [47]: df1.reset_index().set_index(['Name','City']).sortlevel(0).to_csv('test.csv')

In [48]: !cat test.csv
Name,City,index
Alice,Seattle,0
Bob,Seattle,1
Bob,Seattle,4
Mallory,Portland,2
Mallory,Portland,5
Mallory,Seattle,3

@c0indev3l
Copy link
Author

This issue is linked to #4883

@jtratner
Copy link
Contributor

Why does a group by object even have a to_csv method?

@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

it doesn't, its dispatching to the object (which is how apply works)

@cpcloud
Copy link
Member

cpcloud commented Sep 19, 2013

Because it creates a wrapper for the type of groupby lazily

@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

I suppose should explicity allow certain methods on groupby

@cpcloud
Copy link
Member

cpcloud commented Sep 19, 2013

maybe although special casing everything could turn into a mess

@cpcloud
Copy link
Member

cpcloud commented Sep 19, 2013

well not everything but none of the IO stuff really makes sense on a groupby

@jtratner
Copy link
Contributor

Yeah not worth the time.

@jtratner
Copy link
Contributor

One thing we could do is implement dir on the object so it's clearer
what's available for tab completion and introspection.

@cpcloud
Copy link
Member

cpcloud commented Sep 19, 2013

let me take a look at how that wrapper is constructed....it might be deep in some lambda somewhere

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

closed by #4887

@jreback jreback closed this as completed Sep 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants