-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add calamite engine to read_excel
#50581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
30da9a4
ENH: add calamite excel reader and modify test to include engine
kostyafarber a47d3fb
Merge branch 'main' into issue-50395
kostyafarber 6c1dd87
Merge branch 'main' into issue-50395
kostyafarber fd06ad9
fix deps for python-calamine
kostyafarber 8b6200a
Merge branch 'main' into issue-50395
kostyafarber 6a8d822
fix deps for python-calamine, add as pip package
kostyafarber efcb2fc
ENH: fix typo in engine declaration, add import_optional_dependency, …
kostyafarber e1105de
Merge branch 'main' into issue-50395
kostyafarber 6b50e0c
calamite -> calamine, updated some tests for calamine
dimastbk 0784733
calamine excel engine: skip tests with datetime
dimastbk 5971199
Merge branch 'main' into issue-50395
kostyafarber 655318b
Merge branch 'issue-50395' into issue-50395-dima
dimastbk cc049cf
Merge branch 'main' into issue-50395
kostyafarber 038133e
ENH: change reader filename match library, fix typo in engine name in…
kostyafarber 52c2cbd
Merge branch 'issue-50395' into issue-50395-dima
kostyafarber 2dc5e02
Merge pull request #1 from dimastbk/issue-50395-dima
kostyafarber 2076e11
Merge pull request #2 from dimastbk/issue-50395-dima-skip-tests
kostyafarber 6b0a7ac
Merge branch 'main' into issue-50395
kostyafarber a614089
ENH: add back get_sheet_by_index
kostyafarber 256f9f9
Merge branch 'main' into issue-50395
kostyafarber eee8b4e
Merge branch 'main' into issue-50395
kostyafarber 9fc2209
ENH: fix mypy and trailing whitespace
kostyafarber cf1268a
Merge branch 'main' into issue-50395
kostyafarber bebfec5
Merge branch 'main' into issue-50395
kostyafarber 9019904
added conversion date/time/float, support file_rows_needed, fixed sup…
dimastbk 08a5616
Update test_readers.py
dimastbk 677a224
Merge pull request #3 from dimastbk/issue-50395
kostyafarber 8c55e5d
Merge branch 'main' into issue-50395
kostyafarber d817999
update python-calamine to 0.0.7
dimastbk 12aaf19
Merge branch 'main' into issue-50395
kostyafarber 255e8fb
Merge pull request #4 from dimastbk/issue-50395
kostyafarber 500fa9f
Merge branch 'main' into issue-50395
kostyafarber 5d94728
fix review: use CalamineReader/CalamineSheet
dimastbk 15874c3
fixed pyright, fixed docs in __init__
dimastbk 89ae49e
Merge pull request #5 from dimastbk/issue-50395
kostyafarber 33e5b7e
Merge branch 'main' into issue-50395
kostyafarber 85d31ec
Merge branch 'main' into issue-50395
kostyafarber a0d4193
Merge branch 'main' into issue-50395
kostyafarber a6b6fb2
bump python-calamine to 0.1.0
dimastbk 0a431c5
_ValueT -> _CellValueT
dimastbk 745cd09
Merge pull request #6 from dimastbk/issue-50395
kostyafarber 942a16a
Merge branch 'main' into issue-50395
kostyafarber 8803ca9
Merge branch 'main' into issue-50395
kostyafarber 2f5ffba
added xfail to tests, small fixes
dimastbk b8b1a9a
Merge pull request #7 from dimastbk/issue-50395
kostyafarber f5ab40d
Merge branch 'main' into issue-50395
kostyafarber 02c2e7f
bump calamine to 0.1.1, update tests (472 passed, 75 xfailed), update…
dimastbk 74a3e70
Merge pull request #8 from dimastbk/issue-50395
kostyafarber File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ dependencies: | |
- zstandard>=0.15.2 | ||
|
||
- pip: | ||
- python-calamine | ||
- tzdata>=2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ dependencies: | |
- zstandard>=0.15.2 | ||
|
||
- pip: | ||
- python-calamine>=0.1.1 | ||
- tzdata>=2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -70,4 +70,5 @@ dependencies: | |
- py | ||
|
||
- pip: | ||
- python-calamine | ||
- tzdata>=2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,4 +59,5 @@ dependencies: | |
|
||
- pip: | ||
- pyqt5==5.15.1 | ||
- python-calamine==0.1.1 | ||
- tzdata==2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ dependencies: | |
- zstandard>=0.15.2 | ||
|
||
- pip: | ||
- python-calamine | ||
- tzdata>=2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ dependencies: | |
- zstandard>=0.15.2 | ||
|
||
- pip: | ||
- python-calamine | ||
- tzdata>=2022.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,3 +54,6 @@ dependencies: | |
- xlrd>=2.0.1 | ||
- xlsxwriter>=1.4.3 | ||
- zstandard>=0.15.2 | ||
|
||
- pip: | ||
- python-calamine |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
from __future__ import annotations | ||
|
||
from datetime import ( | ||
date, | ||
datetime, | ||
time, | ||
) | ||
from typing import ( | ||
TYPE_CHECKING, | ||
Union, | ||
) | ||
|
||
from pandas.compat._optional import import_optional_dependency | ||
from pandas.util._decorators import doc | ||
|
||
import pandas as pd | ||
from pandas.core.shared_docs import _shared_docs | ||
|
||
from pandas.io.excel._base import BaseExcelReader | ||
|
||
if TYPE_CHECKING: | ||
from pandas._typing import ( | ||
FilePath, | ||
ReadBuffer, | ||
Scalar, | ||
StorageOptions, | ||
) | ||
|
||
_CellValueT = Union[int, float, str, bool, time, date, datetime] | ||
|
||
|
||
class CalamineReader(BaseExcelReader): | ||
@doc(storage_options=_shared_docs["storage_options"]) | ||
def __init__( | ||
self, | ||
filepath_or_buffer: FilePath | ReadBuffer[bytes], | ||
storage_options: StorageOptions = None, | ||
) -> None: | ||
""" | ||
Reader using calamine engine (xlsx/xls/xlsb/ods). | ||
|
||
Parameters | ||
---------- | ||
filepath_or_buffer : str, path to be parsed or | ||
an open readable stream. | ||
{storage_options} | ||
""" | ||
import_optional_dependency("python_calamine") | ||
super().__init__(filepath_or_buffer, storage_options=storage_options) | ||
|
||
@property | ||
def _workbook_class(self): | ||
from python_calamine import CalamineWorkbook | ||
|
||
return CalamineWorkbook | ||
|
||
def load_workbook(self, filepath_or_buffer: FilePath | ReadBuffer[bytes]): | ||
from python_calamine import load_workbook | ||
|
||
return load_workbook(filepath_or_buffer) # type: ignore[arg-type] | ||
|
||
@property | ||
def sheet_names(self) -> list[str]: | ||
return self.book.sheet_names # pyright: ignore | ||
|
||
def get_sheet_by_name(self, name: str): | ||
self.raise_if_bad_sheet_by_name(name) | ||
return self.book.get_sheet_by_name(name) # pyright: ignore | ||
|
||
def get_sheet_by_index(self, index: int): | ||
self.raise_if_bad_sheet_by_index(index) | ||
return self.book.get_sheet_by_index(index) # pyright: ignore | ||
|
||
def get_sheet_data( | ||
self, sheet, file_rows_needed: int | None = None | ||
) -> list[list[Scalar]]: | ||
def _convert_cell(value: _CellValueT) -> Scalar: | ||
if isinstance(value, float): | ||
val = int(value) | ||
if val == value: | ||
return val | ||
else: | ||
return value | ||
elif isinstance(value, date): | ||
return pd.Timestamp(value) | ||
elif isinstance(value, time): | ||
return value.isoformat() | ||
|
||
return value | ||
|
||
rows: list[list[_CellValueT]] = sheet.to_python(skip_empty_area=False) | ||
data: list[list[Scalar]] = [] | ||
|
||
for row in rows: | ||
data.append([_convert_cell(cell) for cell in row]) | ||
if file_rows_needed is not None and len(data) >= file_rows_needed: | ||
break | ||
|
||
return data |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the datetime issue the only limitation? If so we can probably be more explicit and say something like
python-calamine can be used to read all formats, but specifically does not support reading datetimes from .xls and .xlsb formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datetime is the main limitation, but there are a few more bugs, #50581 (comment). I suppressed them all with pytest.xfail, but should I write about them in documentation?