-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add calamine excel reader (close #50395) #54998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice looking change. Are you expecting to submit a conda recipe as well? Would be ideal to have this downloadable there instead of just via pip (though not a blocker)
bdf286b
to
b6701f0
Compare
doc/source/user_guide/io.rst
Outdated
files can be read using ``pyxlsb``. Also, all this formats can be read using ``python-calamine``, | ||
but this library has some different behavior from other libraries (mostly for ``.ods``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation that gives more details on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below this, we document which function is used by the various engines so users can determine which engine_kwargs
they pass. Can you add a line about Calamine there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think, now differences are not too large for explain it in documentation and I removed it:
- calamine can return timedelta;
- some difference from ODFReader, because calamine follows specification and extracts value from xml attributes, not from text (time and boolean, for example).
- added.
I added small chapter for Calamine.
30288dc
to
fe2f6de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice work. Kudos
@@ -66,7 +66,7 @@ computation = ['scipy>=1.8.1', 'xarray>=2022.03.0'] | |||
fss = ['fsspec>=2022.05.0'] | |||
aws = ['s3fs>=2022.05.0'] | |||
gcp = ['gcsfs>=2022.05.0', 'pandas-gbq>=0.17.5'] | |||
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.10', 'pyxlsb>=1.0.9', 'xlrd>=2.0.1', 'xlsxwriter>=3.0.3'] | |||
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.10', 'python-calamine>=0.1.6', 'pyxlsb>=1.0.9', 'xlrd>=2.0.1', 'xlsxwriter>=3.0.3'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually wait a few cycles for a library before adding a new library to extras, but @mroeschke would know best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to add this to the extra right away
a0cef5b
to
863c068
Compare
Co-author: Kostya Farber (pandas-dev#50581)
df8b523
to
d6002fe
Compare
This makes is quite more time consuming to review. |
Do you mean about force-push instead of merging "main" and adding new commits? Sorry. |
Correct - not a big problem but wanted you to be aware. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments on the docs, but the rest looks great. Only remaining issue I think is whether we're adding it to the extras just yet (#54998 (comment)).
Thanks, I fixed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; @WillAyd - you good here?
Thanks @dimastbk - impressive work on this |
* Extract ReadEngine and WriteEngine types for Excel * Add calamine Excel reader engine pandas-dev/pandas#54998 * Add test for read_excel engine
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Based on #50581, which was closed as stale and in favor of #53005, which was rejected. Now, calamine support datetime and sheet types, that more than pyxlsb supports, for example.
Now fail a few tests with ods:
office:time-value
in specification);My English is too bad for writing documentation and I will be happy any help with it.
cc @kostyafarber