Skip to content

Excel writing #735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Excel writing #735

wants to merge 14 commits into from

Conversation

dieterv77
Copy link
Contributor

I added a ExcelWriter class to io/parsers.py
Added a to_excel method to DataFrame and refactored to_csv to use a helper method that also serves to_excel.

I still need to add some tests, but wanted to get feedback before spending any more time.

There are still some issues, like xlwt throws exceptions when writing np.int64 types, but we should be able to special case some of that. I also didn't add any encoding handling, although that should be reasonably easy.

Dieter Vandenbussche added 2 commits February 1, 2012 22:18
Refactor to_csv to use a helper method _helper_csvexcel that is also
used by to_excel
sequence should be given if the DataFrame uses MultiIndex.
"""
from pandas.io.parsers import ExcelWriter
needSave = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a pep8 kind of guy but no biggie (lots of camel case around the codebase I've been working on eliminating)

@wesm
Copy link
Member

wesm commented Feb 2, 2012

Have you thought much about a way to use this as a tool for writing multiple DataFrames to multiple sheets? Also, have you used pyopenxl at all?

@wesm
Copy link
Member

wesm commented Feb 2, 2012

Lastly, would you mind writing a unit test or two? I seem to recall xlwt files being unreadable by xlrd which is a bit unfortunate

@dieterv77
Copy link
Contributor Author

The purpose of the ExcelWriter object is to make it easy to write different DataFrames to different sheets, you can pass the same ExcelWriter to each to_excel call, each time with a different sheet name, and call save at the end (still need to document that in the docstring. I'm hoping to add a to_excel to the Panel class that will do just this for each DataFrame in the Panel.

I haven't checked out pyopenxl, I will do that, i'm not a huge fan of xlrd either.

I will definitely add unit tests, like i said, i just wanted to get some feedback before going deeper.

@dieterv77
Copy link
Contributor Author

I took a look at pyopenxl, seems like a nice project. It can only read/write Excel 2007 files (.xlsx extensions), while xlrd/xlwt can only handle .xls files (Excel 2003 and earlier). If we were to use pyopenxl for the writing, then i think it would be good to make sure we can also read .xlsx files with ExcelFile. I propose we either ditch xlrd from ExcelFile and switch to pyopenxl, or I can try to support both (based on the file extension or something like that). Thoughts?

@lbeltrame
Copy link
Contributor

Personally I would keep both (xls and xlsx), as a way to keep interoperability.

@CRP
Copy link
Contributor

CRP commented Feb 3, 2012

How about using platform-specific interfaces to use current instances of Excel? For example, on windows it would go via com, while on mac it would use appscript. linux users would be left out, unfortunately ;)

@dieterv77
Copy link
Contributor Author

i added support for .xlsx and a corresponding test. I had to special case the handling of np.int64 since neither xlrd nor openpyxl could handle them. My guess is there are other types that could cause trouble, but i think this covers the most common stuff.

One thing i would like to do is to change the default index_col argument in ExcelFile.parse to 0, instead of None. The docstring actually says the default is 0, but it is actually None. Obviously this would break backward compatibility, so wanted to check to see if there are major objections.

@wesm
Copy link
Member

wesm commented Feb 7, 2012

I'm inclined to merge this for 0.7.0 (like, within the next 24 hours) but haven't done rigorous testing myself. Any objections? As long as the original excel-parsing capability is there (unit tests should confirm) then I'd rather get this merged in.

@dieterv77
Copy link
Contributor Author

Any thoughts on changing the default value of index_col in ExcelFile.parse? At a minimum, i should fix the docstring to reflect the actual behavior.

As far as i could tell, there were no unittests for ExcelFile parsing prior to this. The changes i made effectively left the original parsing code intact, assuming the filename had the .xls extension.

@dieterv77
Copy link
Contributor Author

I'm ready for it to merge, modulo the index_col thing, which i have currently left unchanged.

FWIW, i think merging to master will run into some minor conflicts, just because of unrelated changes nearby. If it would be helpful, i can do the merge in a branch on my fork, and resolve the conflicts.

@lbeltrame
Copy link
Contributor

Personally I'd keep the existing behavior wrt index_col, and change the docstring to reflect its actual default value.

@wesm
Copy link
Member

wesm commented Feb 8, 2012

@dieterv77 there is a single test_excel_table test but it's not very comprehensive I don't think

I'm not sure if index_col=0 needs to be the default to get the desired behavior. I'll take a closer look.

@dieterv77
Copy link
Contributor Author

My mistake, i thought all the tests lived in the same directory. sorry about that. The unittests i added do a lot of round-trip type testing, so they exercise the reading as well as the writing.

It's fine to leave index_col=None, i only preferred =0 so that it would match DataFrame.from_csv, but i can appreciate that at this point it is better to preserve backward compatibility.

@wesm
Copy link
Member

wesm commented Feb 8, 2012

I'm adding some "tricks" to get it to automatically infer that there's an index column in many cases (like DataFrame.from_csv). Having a little debate with myself about how the index names should be dealt with in the excel sheet

@wesm
Copy link
Member

wesm commented Feb 8, 2012

The docs for this stuff need work (examples, etc., especially w.r.t. indexes). I'm not going to be able to get to it by release time but hopefully we can make some improvements soon

@dieterv77
Copy link
Contributor Author

I would be happy to start on that and add a pull request once I have made some progress.

@wesm
Copy link
Member

wesm commented Feb 8, 2012

Go for it re: docs. Closing this pull request-- I make some tweaks to make the excel reading a bit more clever about inferring that the first column is the index (as with csv files). Working on test coverage for the rest of the library and will try to cut 0.7.0 final today

@wesm wesm closed this Feb 8, 2012
wesm added a commit that referenced this pull request Feb 8, 2012
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Feb 10, 2012
* commit 'v0.7.0rc1-183-gcc2a8a7': (86 commits)
  DOC: release notes
  ENH: can pass file handle or StringIO to Series.to_csv and DataFrame.to_csv, GH pandas-dev#765
  BUG: fix subtle bug in maybe_convert_objects causing indexes to be mutated, test coverage, fix pandas-dev#766
  TST: merge test coverage and trim floating point zeros even if there are NAs
  TST: skip excel tests if libraries not installed
  TST: more merge test coverage, refactoring
  ENH: more intelligent inference about index_col for Excel files, test coverage for PR pandas-dev#735
  BUG: fix docstring
  TST: test coverage
  Fix up docstrings
  Combine xlsx tests with xls to avoid code duplication
  Add some additional excel reading/writing tests
  Document sheet_name arg in docstring, give it a default value
  Add to_excel in Panel and corresponding test
  Test to_excel with MultiIndex
  Document writing multiple DataFrames to different sheets
  Add support for reading/writing .xlsx using openpyxl
  Special case np.int64 in ExcelWriter and add unittest
  Missed one variable rename
  Add unittest for to_excel
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants