-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Excel writing #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excel writing #735
Conversation
Refactor to_csv to use a helper method _helper_csvexcel that is also used by to_excel
sequence should be given if the DataFrame uses MultiIndex. | ||
""" | ||
from pandas.io.parsers import ExcelWriter | ||
needSave = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a pep8 kind of guy but no biggie (lots of camel case around the codebase I've been working on eliminating)
Have you thought much about a way to use this as a tool for writing multiple DataFrames to multiple sheets? Also, have you used pyopenxl at all? |
Lastly, would you mind writing a unit test or two? I seem to recall xlwt files being unreadable by xlrd which is a bit unfortunate |
The purpose of the ExcelWriter object is to make it easy to write different DataFrames to different sheets, you can pass the same ExcelWriter to each to_excel call, each time with a different sheet name, and call save at the end (still need to document that in the docstring. I'm hoping to add a to_excel to the Panel class that will do just this for each DataFrame in the Panel. I haven't checked out pyopenxl, I will do that, i'm not a huge fan of xlrd either. I will definitely add unit tests, like i said, i just wanted to get some feedback before going deeper. |
I took a look at pyopenxl, seems like a nice project. It can only read/write Excel 2007 files (.xlsx extensions), while xlrd/xlwt can only handle .xls files (Excel 2003 and earlier). If we were to use pyopenxl for the writing, then i think it would be good to make sure we can also read .xlsx files with ExcelFile. I propose we either ditch xlrd from ExcelFile and switch to pyopenxl, or I can try to support both (based on the file extension or something like that). Thoughts? |
Personally I would keep both (xls and xlsx), as a way to keep interoperability. |
How about using platform-specific interfaces to use current instances of Excel? For example, on windows it would go via com, while on mac it would use appscript. linux users would be left out, unfortunately ;) |
i added support for .xlsx and a corresponding test. I had to special case the handling of np.int64 since neither xlrd nor openpyxl could handle them. My guess is there are other types that could cause trouble, but i think this covers the most common stuff. One thing i would like to do is to change the default index_col argument in ExcelFile.parse to 0, instead of None. The docstring actually says the default is 0, but it is actually None. Obviously this would break backward compatibility, so wanted to check to see if there are major objections. |
I'm inclined to merge this for 0.7.0 (like, within the next 24 hours) but haven't done rigorous testing myself. Any objections? As long as the original excel-parsing capability is there (unit tests should confirm) then I'd rather get this merged in. |
Any thoughts on changing the default value of index_col in ExcelFile.parse? At a minimum, i should fix the docstring to reflect the actual behavior. As far as i could tell, there were no unittests for ExcelFile parsing prior to this. The changes i made effectively left the original parsing code intact, assuming the filename had the .xls extension. |
I'm ready for it to merge, modulo the index_col thing, which i have currently left unchanged. FWIW, i think merging to master will run into some minor conflicts, just because of unrelated changes nearby. If it would be helpful, i can do the merge in a branch on my fork, and resolve the conflicts. |
Personally I'd keep the existing behavior wrt index_col, and change the docstring to reflect its actual default value. |
@dieterv77 there is a single I'm not sure if |
My mistake, i thought all the tests lived in the same directory. sorry about that. The unittests i added do a lot of round-trip type testing, so they exercise the reading as well as the writing. It's fine to leave index_col=None, i only preferred =0 so that it would match DataFrame.from_csv, but i can appreciate that at this point it is better to preserve backward compatibility. |
I'm adding some "tricks" to get it to automatically infer that there's an index column in many cases (like |
The docs for this stuff need work (examples, etc., especially w.r.t. indexes). I'm not going to be able to get to it by release time but hopefully we can make some improvements soon |
I would be happy to start on that and add a pull request once I have made some progress. |
Go for it re: docs. Closing this pull request-- I make some tweaks to make the excel reading a bit more clever about inferring that the first column is the index (as with csv files). Working on test coverage for the rest of the library and will try to cut 0.7.0 final today |
* commit 'v0.7.0rc1-183-gcc2a8a7': (86 commits) DOC: release notes ENH: can pass file handle or StringIO to Series.to_csv and DataFrame.to_csv, GH pandas-dev#765 BUG: fix subtle bug in maybe_convert_objects causing indexes to be mutated, test coverage, fix pandas-dev#766 TST: merge test coverage and trim floating point zeros even if there are NAs TST: skip excel tests if libraries not installed TST: more merge test coverage, refactoring ENH: more intelligent inference about index_col for Excel files, test coverage for PR pandas-dev#735 BUG: fix docstring TST: test coverage Fix up docstrings Combine xlsx tests with xls to avoid code duplication Add some additional excel reading/writing tests Document sheet_name arg in docstring, give it a default value Add to_excel in Panel and corresponding test Test to_excel with MultiIndex Document writing multiple DataFrames to different sheets Add support for reading/writing .xlsx using openpyxl Special case np.int64 in ExcelWriter and add unittest Missed one variable rename Add unittest for to_excel ...
I added a ExcelWriter class to io/parsers.py
Added a to_excel method to DataFrame and refactored to_csv to use a helper method that also serves to_excel.
I still need to add some tests, but wanted to get feedback before spending any more time.
There are still some issues, like xlwt throws exceptions when writing np.int64 types, but we should be able to special case some of that. I also didn't add any encoding handling, although that should be reasonably easy.