-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH Pass kwds from ExcelFile ctr to xlrd.open_workbook. For example, thi... #4439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…this allows setting formatting_info=True (GH4438)
@@ -78,10 +78,10 @@ def __init__(self, path_or_buf, kind=None, **kwds): | |||
self.tmpfile = None | |||
|
|||
if isinstance(path_or_buf, compat.string_types): | |||
self.book = xlrd.open_workbook(path_or_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be named kwds? or can open_workbook deal with any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xlrd.open_workbook accepts a number of parameters: http://www.lexicon.net/sjmachin/xlrd.html#xlrd.open_workbook-function
That being said I saw the kwd argument in the ExcelFile constructor. Currently that kwd is not used which means if the user tries to pass any additional parameters, or misspells them, they are silently dropped.
This change not only allows passing more arguments down to xlrd but also adds an additional runtime check.
@cancan101 Please write a test! 😄 I find it very helpful to write a test before I make a change (a hard habit to get into especially if you think the issue/change is small, but it will help you in the long run) Also please add some documentation showing users that you can now pass keyword arguments when reading excel files. |
@cpcloud that make sense. I will do. That being said, should I rename the parameter on ExcelFile to be something more informative and less specific to xlrd? perhaps "find_merged_cells" as opposed to formatting info ? That way parameter can be kept the same if underlying library changes. |
@cpcloud Unit test pushed |
so that merged cells may be read (GH4438)
Add a note to release.rst and squash this into one commit and I'll try to review in the interim. |
This goes back to the question about That way if |
Or should |
To be honest, we could probably hide this entire parameter from the user. If we see a need to load merged cells, such as when Thoughts? |
Couldn't the test case for this just check that merged cells are produced? Does the content matter (seems like it's just what xlrd does)? |
@jtratner I am not sure what you mean. |
It's fine to pass keywords through (and that's reasonable), but 'getting the value of merged cells' is probably not something pandas should be set to do. At that point, you could just use xlrd directly. |
To be clear I mean: converting merged cells into something reasonable in the outputted dataframe makes sense. It doesn't make sense to add something to the public api targeted specifically at merged cells. |
@jtratner I am still not sure what you mean. Maybe I was unclear with my comment: I was just saying there might be a smart way of allowing us to parse merged cells in the ExcelParser without the user having to specify |
@jtratner I think we are in agreement. When I said "hide the parameter" what I should have said was not "not add the parameter to the public API" |
merged_cells = sheet.merged_cells | ||
|
||
self.assertEquals(len(merged_cells), 1) | ||
rlo, rhi, clo, chi = merged_cells[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to test the actual parameters? merged_cells
won't even be on the object if you don't pass formatting_info=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an object, the length is just =0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's close this PR. Just create a new one that tests that you can pass an xlrd workbook to the xlrd reader and that it all works. Then none of these tests will be necessary.
@cancan101 a few final things to get this accepted:
|
do u think have a flavor (or maybe engine?) argument is useful here (to specify the backend) , eg would be 'xlrd' in this case (so we can offer back compat at some point if needed), or overkill ? |
@jreback I think there are already PRs for using other readers/writers for Excel, so not necessarily a bad thing to have. I just don't want to be tied to supporting any keyword arguments that aren't part of the public API. |
@jtratner is this obsolete? (as in your new framework already does this)? |
no, still up for grabs as to what to do (didn't refactor ExcelFile) |
gr8, then is PR look ok? |
@jtratner What is the best way to make this future proof to changes in the Excel parser (ie moving away from xlrd)? |
@cancan101 I think it would be better to just allow |
@jtratner As a slight modification to your suggestion: Change |
yep, that's what I was thinking. |
Closing in favor of adding an option to let ExcelFile take an existing workbook. |
@jtratner Can you leave this PR open? I was about to push fix / change to allow passing in sheet. |
you should create a new PR for that, the discussion on this was about passing kwargs...confusing to do both. Plus, there was the additional discussion about the test cases. |
btw - this is why it's good to do topic branches instead of directly on your own master. Lets you have multiple PRs open. |
Do you mind if I leave a test case that tests both:
|
@jtratner Yea, using master was a mistake. I usually use topic branches. Once pushed, github does not allow changing branch for PR. |
I don't think that's a good test case - that doesn't really tell you anything different than the following: pth = os.path.join(self.dirpath, 'merged.xls')
book = xlrd.open_workbook(pth)
xls = ExcelFile(book)
self.assert_(book is xls.book) The merged cells are available because the user created the workbook instance. You should add a parse check to make sure everything works (as well as sheet name lookup, etc.) |
Okay. Given that the user is now using xlrd directly to create the work book, I guess it does not make sense to bother testing the merged cell functionality; we can assume that xlrd handles that correctly. |
@jtratner Is there one of the current Excel tests that you think would be best to duplicate and have run using an ExcelFile opened from a book? |
can't just do something like:
Where, for now, df is something with floats... |
Created #4961 |
...s allows setting formatting_info=True (GH4438)
closes #4438