-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Suggested improvement: allow ExcelWriter to automatically convert lists and dict to strings #8188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is really something that happens because of a quirk of how the CSV module is set up (converts everything to str) vs. how excel writers are set up (tries to do the right thing by value type). You can actually just do something like: df['myrow'] = df['myrow'].astype(str): In [24]: df['rating_as_dict'] = df['rating_as_dict'].astype(str)
In [25]: df
Out[25]:
titles genre_as_list \
0 Toy Story (1995) [Animation, Children's, Comedy]
1 Jumanji (1995) [Adventure, Children's, Fantasy]
2 Grumpier Old Men (1995) [Comedy, Romance]
3 Waiting to Exhale (1995) [Comedy, Drama]
4 Father of the Bride Part II (1995) [Comedy]
rating_as_dict
0 {'M': 0.6, 'F': 0.5}
1 {'M': 0.2, 'F': 0.1}
2 {'M': 0.3, 'F': 0.3}
3 {'M': 0.8, 'F': 0.8}
4 {'M': 0.6, 'F': 0.8}
In [26]: df['rating_as_dict'][0]
Out[26]: "{'M': 0.6, 'F': 0.5}" There isn't really a 'correct' answer here (I think I'd personally want an error to be raised). |
but it wouldn't be that difficult to incorporate that - do you want to take a stab at it? (the code isn't that complicated and I believe it's all contained within |
I can have a go at this if @MarkInLabcoat doesn't want to. My thinking is that it should be a non-default behaviour, so we'd be adding a Does that sound about right @jtratner ? |
Is it necessary to have a specific keyword for this? The other possibility is to leave this up to the user to do themselves But the error message could certainly be improved. |
@jorisvandenbossche @onesandzeroes - is there a reason why it's useful to not convert to string (given that |
Yes, indeed, that is also good. I just wouldn't introduce a keyword for it. But converting it to string by default seems also good. Certainly if that is also the behaviour of |
I guess I was thinking in terms of how easy it is to actually doing anything with the string-converted lists once you've got them in Excel, but now that I think about it, that's probably not a big concern. I'll start working on converting to string by default. |
You can do this pretty easily, all you need to do is slightly modify this function (in pandas/io/excel.py): def _conv_value(val):
# Convert numpy types to Python types for the Excel writers.
if com.is_integer(val):
val = int(val)
elif com.is_float(val):
val = float(val)
elif com.is_bool(val):
val = bool(val)
elif isinstance(val, Period):
val = "%s" % val
return val just make up a few test cases and we're good :) |
Thanks for the comments. Please go ahead with your own solution. It's somewhat beyond my current ability, and I'm busy wearing my labcoat this week :) |
Add support for StringIO/BytesIO to ExcelWriter Add vbench support for writing excel files Add support for serializing lists/dicts to strings Fix bug when reading blank excel sheets Added xlwt to Python 3.4 builds closes pandas-dev#8188 closes pandas-dev#7074 closes pandas-dev#6403 closes pandas-dev#7171 closes pandas-dev#6947
Problem description:
I have pandas dataframes with diverse data types and structures, including lists. The csv-writer saves the lists without trouble, but I prefer to also save in excel format for collaboration with my colleagues.
Currently the ExcelWriter returns a TypeError if the dataframe contains a list.
TypeError: float() argument must be a string or a number, not 'list'
It would be really great if ExcelWriter accepted lists and dictionaries and tuples.
Would it cause any problem to simply convert these data formats to strings before saving?
Sorry I’m not an experienced programmer, so I haven‘t made a pull request and attempted to improve the module myself.
Here is some code (Python 3.4, pandas 0.14.0) to describe the current situation.
attempt to save the dataframe in excel
which will result in
TypeError: float() argument must be a string or a number, not 'list'
to save the data, it is necessary to convert the python lists or dicts to strings, separated by ", " or pipes "|"
After deleting any columns containing a python list or dict, I can now save without any problem.
If I load from the saved excel sheet, I can confirm that the data is intact
The text was updated successfully, but these errors were encountered: