Skip to content

ENH Change ExcelFile to accept a workbook for the path_or_buf argument. #4962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 27, 2013

Conversation

cancan101
Copy link
Contributor

That will also mean that read_excel accepts workbooks too. (closes #4961 and #4959)

@jtratner
Copy link
Contributor

It might make sense to add an engine argument to both the ExcelFile and read_excel constructor that has to be passed along with a book...not sure.

with ensure_clean(pth) as pth:
df.to_excel(pth, "SheetA")
book = xlrd.open_workbook(pth)
xl = ExcelFile(book)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a couple of lines using the sweet new context managing version of ExcelFile too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud How exactly do I use the new context managing version? Are you talking about this PR: #4750 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with ExcelFile(book) as reader:
    result = reader.parse("SheetA")
    tm.assert_frame_equal(df, result)
# add a test that shows whether or not ExcelFile should be close the book or not

Also, I suggest doing this test twice, once with ExcelFile and once with read_excel (which won't work as a contextmanager)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtratner Sure, will do. How do I use the contextmanager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all a contextmanager means is something you use in a with statement.

@cancan101
Copy link
Contributor Author

@jtratner It seems to make sense to add the engine argument, et. al. in another PR. Thoughts?

@jtratner
Copy link
Contributor

@cancan101 I was thinking that we would require passing an engine argument with a book, so that, if we were to add other engines going forward, we wouldn't have to try isinstance checks for determining which backend to use.

So in other words you have engine=None, an if statement that checks for whatever conditions were previously checked for path_or_buf and then a final else clause that tells you that you have to specify an engine if not passing a buffer or path. Not totally sure on that, but that's what I was thinking.

@cancan101
Copy link
Contributor Author

@jtratner @cpcloud I pushed a change to test read_excel and renamed path_or_buf. I am still a little unclear on how to deal with an engine param for now. For the time being I can map both None and "xlrd" to go through the current code path and anything else to raise and error. Not sure how useful that is though.

@cpcloud
Copy link
Member

cpcloud commented Sep 25, 2013

@cancan101 As different excel reading backends might have slightly different APIs for excel workbooks, then passing an engine should really be a requirement if you want to pass a workbook in as io. More clear? For example if I have an excel workbook that I've created using openpyxl (an instance of Workbook), that's different from the Book class used by xlrd. One would need a way to differentiate between those, and if I understand @jtratner then that's exactly what engine would do.

@jtratner
Copy link
Contributor

yep, that's exactly what I mean (even though for right now it's all xlrd). Thanks for laying that out!

@cancan101
Copy link
Contributor Author

@jtratner @cpcloud Okay. I pushed something for engine support. Let me know if this isn't right.

@jtratner
Copy link
Contributor

I'll take a look soon.

On Wed, Sep 25, 2013 at 7:33 PM, Alex Rothberg [email protected]:

@jtratner https://github.com/jtratner @cpcloudhttps://github.com/cpcloudOkay. I pushed something for
engine support. Let me know if this isn't right.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4962#issuecomment-25133440
.

@cancan101
Copy link
Contributor Author

ping!

@jtratner
Copy link
Contributor

@cancan101 thanks for the ping. Apparently you need to rebase this too. (so sayeth github)

@@ -254,6 +254,37 @@ def test_excel_read_buffer(self):
f = open(pth, 'rb')
xl = ExcelFile(f)
xl.parse('Sheet1', index_col=0, parse_dates=True)

def test_excel_read_worksheet_excelfile(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the name of this test to test_read_xlrd_Book(self) - that's what your testing (and it makes it clearer that it only applies to xlrd).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the next test, test_excel_read_worksheet_read_excel be renamed to test_read_excel_xlrd_Book ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and as for test_read_xlrd_Book, what about test_excel_file_xlrd_Book?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, can you just combine the two tests together? They aren't actually that different. (so you can just add two extra lines after 270 that have a call to read_excel and another assert_frame_equal call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

(formerly path_or_buf) argument; this requires engine to be set.
(GH4961)
@cancan101
Copy link
Contributor Author

@jtratner Pushed.

@jtratner
Copy link
Contributor

after this passes Travis I'll merge.

jtratner added a commit that referenced this pull request Sep 27, 2013
ENH Change ExcelFile to accept a workbook for the path_or_buf argument.
@jtratner jtratner merged commit 0fbd347 into pandas-dev:master Sep 27, 2013
@cancan101 cancan101 deleted the excel_take_workbook branch October 20, 2013 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change ExcelFile to accept a workbook for the path_or_buf argument
3 participants