Skip to content

Suggested improvement: allow ExcelWriter to automatically convert lists and dict to strings #8188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
teese opened this issue Sep 5, 2014 · 9 comments · Fixed by #10376
Closed
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Excel read_excel, to_excel
Milestone

Comments

@teese
Copy link

teese commented Sep 5, 2014

Problem description:
I have pandas dataframes with diverse data types and structures, including lists. The csv-writer saves the lists without trouble, but I prefer to also save in excel format for collaboration with my colleagues.
Currently the ExcelWriter returns a TypeError if the dataframe contains a list.
TypeError: float() argument must be a string or a number, not 'list'

It would be really great if ExcelWriter accepted lists and dictionaries and tuples.
Would it cause any problem to simply convert these data formats to strings before saving?
Sorry I’m not an experienced programmer, so I haven‘t made a pull request and attempted to improve the module myself.
Here is some code (Python 3.4, pandas 0.14.0) to describe the current situation.

import pandas as pd
#create a new dataframe that includes a list and a dict
title = pd.Series(['Toy Story (1995)',
 'Jumanji (1995)',
 'Grumpier Old Men (1995)',
 'Waiting to Exhale (1995)',
 'Father of the Bride Part II (1995)'], index = [0 ,1, 2, 3, 4], name='titles')
genre_as_list = pd.Series([['Animation', "Children's", 'Comedy'],
 ['Adventure', "Children's", 'Fantasy'],
 ['Comedy', 'Romance'],
 ['Comedy', 'Drama'],
 ['Comedy']], index = [0 ,1, 2, 3, 4], name='genre_as_list')
rating_as_dict = pd.Series([{'F':0.5,'M':0.6},
 {'F':0.1,'M':0.2},
 {'F':0.3,'M':0.3},
 {'F':0.8,'M':0.8},
 {'F':0.8,'M':0.6}], index = [0 ,1, 2, 3, 4], name='rating_as_dict')
df = pd.concat([title, genre_as_list, rating_as_dict], axis=1)

attempt to save the dataframe in excel

writer = pd.ExcelWriter('saved_df.xlsx')
df.to_excel(writer, sheet_name='Sheet1')

which will result in
TypeError: float() argument must be a string or a number, not 'list'
to save the data, it is necessary to convert the python lists or dicts to strings, separated by ", " or pipes "|"

df['genre_as_string'] = [str(i) for i in df['genre_as_list']]
df['rating_as_string'] = [str(i) for i in df['rating_as_dict']]
df['genre_pipes_string'] = ['|'.join(i) for i in df['genre_as_list']]
#in my ipython output, the difference is not actually visible between the lists and the stringlists
print('first item from list: %s\nfirst item from stringlist: %s' % (df['genre_as_list'][0][0],
                                                          df['genre_as_string'][0][0]))

After deleting any columns containing a python list or dict, I can now save without any problem.

df = df.drop(['genre_as_list','rating_as_dict'], axis=1)
writer = pd.ExcelWriter('saved_df.xlsx')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()

If I load from the saved excel sheet, I can confirm that the data is intact

df_from_excel = pd.read_excel('saved_df.xlsx', sheetname='Sheet1')
#convert back to original python lists and dict, if desired
df_from_excel['genre_as_list_again'] = [eval(i) for i in df_from_excel['genre_as_string']]
df_from_excel['rating_as_dict_again'] = [eval(i) for i in df_from_excel['rating_as_string']]
df_from_excel.loc[0,'genre_as_list_again'][0]
@jtratner
Copy link
Contributor

jtratner commented Sep 7, 2014

This is really something that happens because of a quirk of how the CSV module is set up (converts everything to str) vs. how excel writers are set up (tries to do the right thing by value type). You can actually just do something like: df['myrow'] = df['myrow'].astype(str):

In [24]: df['rating_as_dict'] = df['rating_as_dict'].astype(str)

In [25]: df
Out[25]:
                               titles                     genre_as_list  \
0                    Toy Story (1995)   [Animation, Children's, Comedy]
1                      Jumanji (1995)  [Adventure, Children's, Fantasy]
2             Grumpier Old Men (1995)                 [Comedy, Romance]
3            Waiting to Exhale (1995)                   [Comedy, Drama]
4  Father of the Bride Part II (1995)                          [Comedy]

         rating_as_dict
0  {'M': 0.6, 'F': 0.5}
1  {'M': 0.2, 'F': 0.1}
2  {'M': 0.3, 'F': 0.3}
3  {'M': 0.8, 'F': 0.8}
4  {'M': 0.6, 'F': 0.8}

In [26]: df['rating_as_dict'][0]
Out[26]: "{'M': 0.6, 'F': 0.5}"

There isn't really a 'correct' answer here (I think I'd personally want an error to be raised).

@jtratner
Copy link
Contributor

jtratner commented Sep 8, 2014

but it wouldn't be that difficult to incorporate that - do you want to take a stab at it? (the code isn't that complicated and I believe it's all contained within pandas/io/excel.py, really just adding another case to the checks - you can use core's pandas.core.common.is_list_like() for your check

@onesandzeroes
Copy link
Contributor

I can have a go at this if @MarkInLabcoat doesn't want to. My thinking is that it should be a non-default behaviour, so we'd be adding a convert_to_str=False default arg to to_excel(). Would also need to add a note about it in the TypeError message so people know where to look if they want to enable.

Does that sound about right @jtratner ?

@jorisvandenbossche
Copy link
Member

Is it necessary to have a specific keyword for this? The other possibility is to leave this up to the user to do themselves astype(str) on the needed columns instead of something like convert_to_str=True if they really want to store lists or dicts with to_excel?

But the error message could certainly be improved.

@jtratner
Copy link
Contributor

jtratner commented Sep 8, 2014

@jorisvandenbossche @onesandzeroes - is there a reason why it's useful to not convert to string (given that to_csv does it already)? it's not a particularly large performance impact (a third if clause branch) and it just means that you can do to_excel() without worrying about the types in your DataFrame.

@jorisvandenbossche
Copy link
Member

Yes, indeed, that is also good. I just wouldn't introduce a keyword for it. But converting it to string by default seems also good. Certainly if that is also the behaviour of to_csv

@onesandzeroes
Copy link
Contributor

I guess I was thinking in terms of how easy it is to actually doing anything with the string-converted lists once you've got them in Excel, but now that I think about it, that's probably not a big concern. I'll start working on converting to string by default.

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2014

You can do this pretty easily, all you need to do is slightly modify this function (in pandas/io/excel.py):

def _conv_value(val):
    # Convert numpy types to Python types for the Excel writers.
    if com.is_integer(val):
        val = int(val)
    elif com.is_float(val):
        val = float(val)
    elif com.is_bool(val):
        val = bool(val)
    elif isinstance(val, Period):
        val = "%s" % val

    return val

just make up a few test cases and we're good :)

@teese
Copy link
Author

teese commented Sep 9, 2014

Thanks for the comments. Please go ahead with your own solution. It's somewhat beyond my current ability, and I'm busy wearing my labcoat this week :)

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions IO Excel read_excel, to_excel Good as first PR labels Sep 21, 2014
hunterowens added a commit to hunterowens/pandas that referenced this issue Jan 2, 2015
hunterowens added a commit to hunterowens/pandas that referenced this issue Jan 2, 2015
@jreback jreback added this to the 0.17.0 milestone Jun 18, 2015
bashtage pushed a commit to bashtage/pandas that referenced this issue Jun 20, 2015
Add support for StringIO/BytesIO to ExcelWriter
Add vbench support for writing excel files
Add support for serializing lists/dicts to strings
Fix bug when reading blank excel sheets
Added xlwt to Python 3.4 builds

closes pandas-dev#8188
closes pandas-dev#7074
closes pandas-dev#6403
closes pandas-dev#7171
closes pandas-dev#6947
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Excel read_excel, to_excel
Projects
None yet
5 participants