Excel writing #735

dieterv77 · 2012-02-02T15:57:03Z

I added a ExcelWriter class to io/parsers.py
Added a to_excel method to DataFrame and refactored to_csv to use a helper method that also serves to_excel.

I still need to add some tests, but wanted to get feedback before spending any more time.

There are still some issues, like xlwt throws exceptions when writing np.int64 types, but we should be able to special case some of that. I also didn't add any encoding handling, although that should be reasonably easy.

Refactor to_csv to use a helper method _helper_csvexcel that is also used by to_excel

wesm · 2012-02-02T19:13:31Z

pandas/core/frame.py

+            sequence should be given if the DataFrame uses MultiIndex.
+        """
+        from pandas.io.parsers import ExcelWriter
+        needSave = False


I'm a pep8 kind of guy but no biggie (lots of camel case around the codebase I've been working on eliminating)

wesm · 2012-02-02T19:19:48Z

Have you thought much about a way to use this as a tool for writing multiple DataFrames to multiple sheets? Also, have you used pyopenxl at all?

wesm · 2012-02-02T19:21:24Z

Lastly, would you mind writing a unit test or two? I seem to recall xlwt files being unreadable by xlrd which is a bit unfortunate

dieterv77 · 2012-02-02T19:43:29Z

The purpose of the ExcelWriter object is to make it easy to write different DataFrames to different sheets, you can pass the same ExcelWriter to each to_excel call, each time with a different sheet name, and call save at the end (still need to document that in the docstring. I'm hoping to add a to_excel to the Panel class that will do just this for each DataFrame in the Panel.

I haven't checked out pyopenxl, I will do that, i'm not a huge fan of xlrd either.

I will definitely add unit tests, like i said, i just wanted to get some feedback before going deeper.

dieterv77 · 2012-02-03T03:07:06Z

I took a look at pyopenxl, seems like a nice project. It can only read/write Excel 2007 files (.xlsx extensions), while xlrd/xlwt can only handle .xls files (Excel 2003 and earlier). If we were to use pyopenxl for the writing, then i think it would be good to make sure we can also read .xlsx files with ExcelFile. I propose we either ditch xlrd from ExcelFile and switch to pyopenxl, or I can try to support both (based on the file extension or something like that). Thoughts?

lbeltrame · 2012-02-03T08:41:53Z

Personally I would keep both (xls and xlsx), as a way to keep interoperability.

CRP · 2012-02-03T15:51:44Z

How about using platform-specific interfaces to use current instances of Excel? For example, on windows it would go via com, while on mac it would use appscript. linux users would be left out, unfortunately ;)

dieterv77 · 2012-02-03T21:53:22Z

i added support for .xlsx and a corresponding test. I had to special case the handling of np.int64 since neither xlrd nor openpyxl could handle them. My guess is there are other types that could cause trouble, but i think this covers the most common stuff.

One thing i would like to do is to change the default index_col argument in ExcelFile.parse to 0, instead of None. The docstring actually says the default is 0, but it is actually None. Obviously this would break backward compatibility, so wanted to check to see if there are major objections.

wesm · 2012-02-07T23:43:18Z

I'm inclined to merge this for 0.7.0 (like, within the next 24 hours) but haven't done rigorous testing myself. Any objections? As long as the original excel-parsing capability is there (unit tests should confirm) then I'd rather get this merged in.

dieterv77 · 2012-02-08T01:33:06Z

Any thoughts on changing the default value of index_col in ExcelFile.parse? At a minimum, i should fix the docstring to reflect the actual behavior.

As far as i could tell, there were no unittests for ExcelFile parsing prior to this. The changes i made effectively left the original parsing code intact, assuming the filename had the .xls extension.

dieterv77 · 2012-02-08T02:15:30Z

I'm ready for it to merge, modulo the index_col thing, which i have currently left unchanged.

FWIW, i think merging to master will run into some minor conflicts, just because of unrelated changes nearby. If it would be helpful, i can do the merge in a branch on my fork, and resolve the conflicts.

lbeltrame · 2012-02-08T08:42:11Z

Personally I'd keep the existing behavior wrt index_col, and change the docstring to reflect its actual default value.

wesm · 2012-02-08T17:26:15Z

@dieterv77 there is a single test_excel_table test but it's not very comprehensive I don't think

I'm not sure if index_col=0 needs to be the default to get the desired behavior. I'll take a closer look.

dieterv77 · 2012-02-08T19:17:16Z

My mistake, i thought all the tests lived in the same directory. sorry about that. The unittests i added do a lot of round-trip type testing, so they exercise the reading as well as the writing.

It's fine to leave index_col=None, i only preferred =0 so that it would match DataFrame.from_csv, but i can appreciate that at this point it is better to preserve backward compatibility.

wesm · 2012-02-08T19:19:06Z

I'm adding some "tricks" to get it to automatically infer that there's an index column in many cases (like DataFrame.from_csv). Having a little debate with myself about how the index names should be dealt with in the excel sheet

wesm · 2012-02-08T19:20:55Z

The docs for this stuff need work (examples, etc., especially w.r.t. indexes). I'm not going to be able to get to it by release time but hopefully we can make some improvements soon

dieterv77 · 2012-02-08T20:21:53Z

I would be happy to start on that and add a pull request once I have made some progress.

wesm · 2012-02-08T20:28:13Z

Go for it re: docs. Closing this pull request-- I make some tweaks to make the excel reading a bit more clever about inferring that the first column is the index (as with csv files). Working on test coverage for the rest of the library and will try to cut 0.7.0 final today

… coverage for PR #735

* commit 'v0.7.0rc1-183-gcc2a8a7': (86 commits) DOC: release notes ENH: can pass file handle or StringIO to Series.to_csv and DataFrame.to_csv, GH pandas-dev#765 BUG: fix subtle bug in maybe_convert_objects causing indexes to be mutated, test coverage, fix pandas-dev#766 TST: merge test coverage and trim floating point zeros even if there are NAs TST: skip excel tests if libraries not installed TST: more merge test coverage, refactoring ENH: more intelligent inference about index_col for Excel files, test coverage for PR pandas-dev#735 BUG: fix docstring TST: test coverage Fix up docstrings Combine xlsx tests with xls to avoid code duplication Add some additional excel reading/writing tests Document sheet_name arg in docstring, give it a default value Add to_excel in Panel and corresponding test Test to_excel with MultiIndex Document writing multiple DataFrames to different sheets Add support for reading/writing .xlsx using openpyxl Special case np.int64 in ExcelWriter and add unittest Missed one variable rename Add unittest for to_excel ...

Dieter Vandenbussche added 2 commits February 1, 2012 22:18

Use izip instead of zip

cd393c1

Add to_excel method to DataFrame, add ExcelWriter helper class

3da3837

Refactor to_csv to use a helper method _helper_csvexcel that is also used by to_excel

wesm reviewed Feb 2, 2012
View reviewed changes

Dieter Vandenbussche added 2 commits February 2, 2012 23:34

Minor changes based on comments

b8d3948

Add unittest for to_excel

07cb55a

Dieter Vandenbussche added 4 commits February 3, 2012 15:39

Missed one variable rename

d1ea88b

Special case np.int64 in ExcelWriter and add unittest

4e3649c

Add support for reading/writing .xlsx using openpyxl

9b51dd9

Document writing multiple DataFrames to different sheets

f627638

Dieter Vandenbussche added 3 commits February 3, 2012 17:03

Test to_excel with MultiIndex

b95af85

Add to_excel in Panel and corresponding test

f50e4b9

Document sheet_name arg in docstring, give it a default value

1f46216

Dieter Vandenbussche added 3 commits February 7, 2012 20:57

Add some additional excel reading/writing tests

445bd8e

Combine xlsx tests with xls to avoid code duplication

494f9bd

Fix up docstrings

d120610

wesm closed this Feb 8, 2012

wesm added a commit that referenced this pull request Feb 8, 2012

ENH: more intelligent inference about index_col for Excel files, test…

46fcdff

… coverage for PR #735

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excel writing #735

Excel writing #735

dieterv77 commented Feb 2, 2012

wesm Feb 2, 2012

wesm commented Feb 2, 2012

wesm commented Feb 2, 2012

dieterv77 commented Feb 2, 2012

dieterv77 commented Feb 3, 2012

lbeltrame commented Feb 3, 2012

CRP commented Feb 3, 2012

dieterv77 commented Feb 3, 2012

wesm commented Feb 7, 2012

dieterv77 commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

lbeltrame commented Feb 8, 2012

wesm commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

wesm commented Feb 8, 2012

wesm commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

wesm commented Feb 8, 2012

Excel writing #735

Excel writing #735

Conversation

dieterv77 commented Feb 2, 2012

wesm Feb 2, 2012

Choose a reason for hiding this comment

wesm commented Feb 2, 2012

wesm commented Feb 2, 2012

dieterv77 commented Feb 2, 2012

dieterv77 commented Feb 3, 2012

lbeltrame commented Feb 3, 2012

CRP commented Feb 3, 2012

dieterv77 commented Feb 3, 2012

wesm commented Feb 7, 2012

dieterv77 commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

lbeltrame commented Feb 8, 2012

wesm commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

wesm commented Feb 8, 2012

wesm commented Feb 8, 2012

dieterv77 commented Feb 8, 2012

wesm commented Feb 8, 2012