-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Support writing/reading notes to/from excel files #58831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this looks like a good PR overall but this is not actually what I originally suggested. The However, what you have built is the generic ability for the So this is a comprehensive PR. It may have been best to segregate these different components into different successive PRs, to make it easier to properly review and consider all the edge cases. One case that springs to mind is that the way tooltips works for Styler is to Before doing a full review it will be interesting to hear from other core devs. Also in regards to potential spilt of the PR. |
36bb4da
to
a57c45e
Compare
I am -.5 on adding this as a keyword argument to the excel I/O functions; the fact that so few of the underlying libraries supports this is a red flag. Generally we have a really high bar to expand our API, and adding this without supporting all underlying libraries is confusing for end users |
@attack68 Thank you for the comment, and sorry for our miss interpretation, I've adjusted the description of the PR accordingly.
That's indeed a point we noticed once we were testing the code, however, we didn't know what would be the right way to approach this, we'll await further instructions from core developers as you suggested. |
@WillAyd Thank you for having a look at our pull request. We acknowledge that introducing a feature without support for all underlying libraries may not appear ideal. Initially, we did not foresee this would be a problem. However, after a detailed examination of the source code from the relevant libraries, we discovered that many are no longer actively developed and/or have limited functionality, Despite these limitations, implementing this feature in pandas will enable support for notes in Excel formats such as We leave links to the libraries so you can see for yourself. |
But this PR is still modifying the core
While
From this, it seems unlikely for calamine to gain the ability to read notes. If I'm understanding this PR correctly, only openpyxl and xlrd support reading comments. Coupled with the fact that the API is, in my opinion, somewhat awkward (passing in a DataFrame to store the result), I am rather negative on supporting reading notes as well. Somewhat of a separate question - are there engines which are no longer maintained and they provide no additional functionality from libraries that are maintained? These could be deprecated - xlrd comes to mind, but I'm not sure if calamine completely replaces it. |
No, to_excel is not modified, the notes are kept as an attribute of ExcelFormatter (defaulted to None) that is then received by the writers.
Reading notes was initially added for testing purposes as an afterthought. It's a more intrusive method and is supported by fewer libraries. Now that writing notes has demonstrated its intended functionality, this feature could be removed.
Based on our research, the writers xlswriter and openpyxl both write to .xlsx files (I believe Pandas uses both in parallel due to some compatibility issues). The odswriter writes to .ods files. |
Rebased to address a conflict with this commit |
@attack68 - you're good with just writing notes instead of having reading as well? |
Yes that sounds good. I think the writing mode is a good feature and relatively speaking it is a fairly neglible extension to the overall API. I think this should be accepted to merge. I understand that the reading was added probably as a 'round trip' test whcih is sensible but reading is a separate process and this could (and should) be added in a separate, atomic PR, albeit that seems like lesser chance of acceptance. |
We are also ok with this. @rhshadrach should we now delete the reading from this pr and open a separate one? |
@diogomsmiranda - modifying this PR would be preferable. |
2913e11
to
7ed8fb5
Compare
@rhshadrach We've modified the PR, however we are having a weird issue with the docstrings validation, any idea what might be causing this problem? |
3db4686
to
4e40bf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put remarks on one of the writers - I think this applies to all writers. This also needs tests. While pandas does not have the ability to read the notes, you can use one of the Excel engines that does to read them directly.
What happens in the case of merged cells?
83896cf
to
f57b587
Compare
I've updated this in all writers, except the first example, which I didn't understand, could you better explain how do you want us to change that logic?
I've brought all the tests from the original issue except now the reading is handled directly by the library
The comment writing occurs after all the content is written, so if the notes are passed in mind with cell merging that's how the writing will proceed. Also some |
pandas/io/excel/_odswriter.py
Outdated
raise NotImplementedError( | ||
""" | ||
Notes are not supported by the odswriter engine, | ||
see https://github.com/mmulqueen/odswriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reference to the linked repo. pandas uses https://github.com/eea/odfpy for writing ODS files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I might be wrong but I believe we went to pandas documentation when searching for all the engines pandas currently uses and found this for ExcelWriters, in here explicitly links mmulqueen/odswriter as the engine to write ods files.
https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, odfpy
is indeed used but for reading Open-Document type files (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas-read-excel). The writing operation is taken care by that odswriter
as my colleague referred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @diogomsmiranda @Dacops. I can see why now. The docstring in pd.ExcelWriter
is incorrect (looks like was added in #47568).
-
If you check the
engine_kwargs
section, it mentionsodf.opendocument.OpenDocumentSpreadsheet(**engine_kwargs)
. None of this construct is available in the linkedodswriter
library, but is available in theodfpy
library instead https://github.com/eea/odfpy/blob/574f0fafad73a15a5b11b115d94821623274b4b0/odf/opendocument.py#L842. -
If you check what extra libraries pandas installs when you run
pip install pandas[excel]
, you don't see odswriter listed there.Line 69 in dd87dd3
excel = ['odfpy>=1.4.1', 'openpyxl>=3.1.0', 'python-calamine>=0.1.7', 'pyxlsb>=1.0.10', 'xlrd>=2.0.1', 'xlsxwriter>=3.0.5'] -
grepping the codebase for
import odswriter
gives 0 hits
cc @rhshadrach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I'm glad it could be useful to find the mistake. We'll change the link to the correct one, or even better check if odfpy
has compatibility with notes, and adjust accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking through the codebase for mentions of comments or notes in cells I didn't find anything, although I did find a couple discussion about the current state of the project, mentioned as "dead".
In the mean time I've fixed the exception to link the right library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current state of the project, mentioned as "dead".
I think that's fair. The last commit on that repo seems to be 3 years ago.
f57b587
to
7d3558c
Compare
ab68132
to
9ac1ea4
Compare
This looks a lot cleaner now, more targeted and the tests are written to detect the notes. Is there much outstanding other than fixing any failing tests? |
Nope, I believe this is done, the failing tests were something that was failing globally a while back (32bit test suites were suffering from overflows) but I believe this is fixed now. I'll rebase the PR to address the conflict at the |
…#58070) Co-Authored-By: diogomsmiranda <[email protected]>
Hey, @rhshadrach for some reason Github is still showing that you requested changes, but I believe we already addressed them. Can you confirm everything is ok? |
I should have some time in the next few days. As a side note, this would be easier on my end if there wasn't force-pushes involved, as then I could just view the diff from the last time I reviewed this PR. However, because there are force-pushes, I need to review the entire PR again. |
Thanks for the feedback and sorry for the inconvenience, it's noted for any future contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks well structured and good to me.
@rhshadrach can you take a second look when possible to avoid this stalling whilst its green? |
/preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run
notes = DataFrame({"a": 3 * ["note 1"], "b": 3 * ["note 2"]})
df = DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
df.style.set_tooltips(notes).to_excel("test.xlsx", sheet_name="A")
The index label for column a
is missing in the Excel sheet (column b
is there).
In addition, the following modifications are also broken.
- When
df
has MultiIndex (hierarchical) columns startrow
is nonzero- Specifying columns as a subset of all columns (e.g.
columns=["b"]
)
""" | ||
Notes are not supported by the odswriter engine, | ||
see https://github.com/eea/odfpy | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a regular string, "Notes are not..."
, not a triple quoted string.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the PR, but it appears to have gone stale. If you could address the comments/requests above and we'd be more than happy to reopen! |
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.Me and my colleague @Dacops developed this feature with the use case issued in mind.
We believe the solution can be better understood when looking at the examples we provide in the documentation but also in the tests we made.
Unfortunately, not every engine supported by pandas had the capability to handle notes (calamine, odfreader, pyxlsb, odswriter), on the remaining ones we took the liberty to not only add writing capability as requested in #58070 but also add the complementary operation of reading notes.
In summary, now you can write notes to an excel (using either openpyxl or xlswriter) by doing:
The complementary of this (reading using either xlrd or openpyxl) would look something like:
We will be available and ready to answer any doubts related to the pr.
Hope everything is to your liking.