Skip to content

ENH: add calamine excel reader (close #50395) #54998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 12, 2023

Conversation

dimastbk
Copy link
Contributor

@dimastbk dimastbk commented Sep 4, 2023

Based on #50581, which was closed as stale and in favor of #53005, which was rejected. Now, calamine support datetime and sheet types, that more than pyxlsb supports, for example.

Now fail a few tests with ods:

My English is too bad for writing documentation and I will be happy any help with it.

cc @kostyafarber

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice looking change. Are you expecting to submit a conda recipe as well? Would be ideal to have this downloadable there instead of just via pip (though not a blocker)

Comment on lines 3456 to 3457
files can be read using ``pyxlsb``. Also, all this formats can be read using ``python-calamine``,
but this library has some different behavior from other libraries (mostly for ``.ods``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation that gives more details on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below this, we document which function is used by the various engines so users can determine which engine_kwargs they pass. Can you add a line about Calamine there.

Copy link
Contributor Author

@dimastbk dimastbk Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think, now differences are not too large for explain it in documentation and I removed it:
  • calamine can return timedelta;
  • some difference from ODFReader, because calamine follows specification and extracts value from xml attributes, not from text (time and boolean, for example).
  1. added.

I added small chapter for Calamine.

@dimastbk dimastbk force-pushed the issue-50395 branch 3 times, most recently from 30288dc to fe2f6de Compare September 6, 2023 10:22
@dimastbk
Copy link
Contributor Author

dimastbk commented Sep 6, 2023

@WillAyd recipe was added to conda-forge.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice work. Kudos

@@ -66,7 +66,7 @@ computation = ['scipy>=1.8.1', 'xarray>=2022.03.0']
fss = ['fsspec>=2022.05.0']
aws = ['s3fs>=2022.05.0']
gcp = ['gcsfs>=2022.05.0', 'pandas-gbq>=0.17.5']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.10', 'pyxlsb>=1.0.9', 'xlrd>=2.0.1', 'xlsxwriter>=3.0.3']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.10', 'python-calamine>=0.1.6', 'pyxlsb>=1.0.9', 'xlrd>=2.0.1', 'xlsxwriter>=3.0.3']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually wait a few cycles for a library before adding a new library to extras, but @mroeschke would know best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to add this to the extra right away

@dimastbk dimastbk force-pushed the issue-50395 branch 4 times, most recently from a0cef5b to 863c068 Compare September 7, 2023 11:32
@dimastbk dimastbk requested a review from rhshadrach September 7, 2023 13:38
@rhshadrach
Copy link
Member

dimastbk force-pushed the issue-50395 branch from df8b523 to d6002fe
4 hours ago

This makes is quite more time consuming to review.

@dimastbk
Copy link
Contributor Author

dimastbk commented Sep 9, 2023

Do you mean about force-push instead of merging "main" and adding new commits? Sorry.

@rhshadrach
Copy link
Member

Do you mean about force-push instead of merging "main" and adding new commits? Sorry.

Correct - not a big problem but wanted you to be aware.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments on the docs, but the rest looks great. Only remaining issue I think is whether we're adding it to the extras just yet (#54998 (comment)).

@dimastbk
Copy link
Contributor Author

Thanks, I fixed it.

@rhshadrach rhshadrach added this to the 2.2 milestone Sep 12, 2023
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; @WillAyd - you good here?

@WillAyd WillAyd merged commit 79067a7 into pandas-dev:main Sep 12, 2023
@WillAyd
Copy link
Member

WillAyd commented Sep 12, 2023

Thanks @dimastbk - impressive work on this

@dimastbk dimastbk deleted the issue-50395 branch September 13, 2023 00:58
davetapley added a commit to JEFuller/pandas-stubs that referenced this pull request Mar 31, 2024
davetapley added a commit to JEFuller/pandas-stubs that referenced this pull request Apr 2, 2024
davetapley added a commit to JEFuller/pandas-stubs that referenced this pull request Apr 4, 2024
davetapley added a commit to JEFuller/pandas-stubs that referenced this pull request Apr 5, 2024
Dr-Irv pushed a commit to pandas-dev/pandas-stubs that referenced this pull request Apr 9, 2024
* Extract ReadEngine and WriteEngine types for Excel

* Add calamine Excel reader engine

pandas-dev/pandas#54998

* Add test for read_excel engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Adding support for calamine as Excel reader engine
5 participants