Skip to content

Commit 745cd09

Browse files
authored
Merge pull request pandas-dev#6 from dimastbk/issue-50395
bump python-calamine to 0.1.0
2 parents a0d4193 + 0a431c5 commit 745cd09

18 files changed

+54
-57
lines changed

ci/deps/actions-310.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,5 @@ dependencies:
5555
- zstandard>=0.15.2
5656

5757
- pip:
58-
- tzdata>=2022.1
5958
- python-calamine
59+
- tzdata>=2022.1

ci/deps/actions-311.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -55,4 +55,5 @@ dependencies:
5555
- zstandard>=0.15.2
5656

5757
- pip:
58+
- python-calamine>=0.1.0
5859
- tzdata>=2022.1

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -59,5 +59,5 @@ dependencies:
5959

6060
- pip:
6161
- pyqt5==5.15.1
62-
- python-calamine==0.0.8
62+
- python-calamine==0.1.0
6363
- tzdata==2022.1

ci/deps/actions-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,5 @@ dependencies:
5555
- zstandard>=0.15.2
5656

5757
- pip:
58+
- python-calamine
5859
- tzdata>=2022.1
59-
- python-calamine

ci/deps/actions-39.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,5 @@ dependencies:
5555
- zstandard>=0.15.2
5656

5757
- pip:
58+
- python-calamine
5859
- tzdata>=2022.1
59-
- python-calamine

ci/deps/circle-38-arm64.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,6 @@ dependencies:
5454
- xlrd>=2.0.1
5555
- xlsxwriter>=1.4.3
5656
- zstandard>=0.15.2
57-
58-
- pip:
59-
- python-calamine
57+
58+
- pip:
59+
- python-calamine

doc/source/getting_started/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,7 @@ xlrd 2.0.1 excel Reading Excel
345345
xlsxwriter 1.4.3 excel Writing Excel
346346
openpyxl 3.0.7 excel Reading / writing for xlsx files
347347
pyxlsb 1.0.8 excel Reading for xlsb files
348+
python-calamine 0.1.0 excel Reading for xls/xlsx/xlsb/ods files
348349
========================= ================== =============== =============================================================
349350

350351
HTML

doc/source/user_guide/io.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -3420,7 +3420,8 @@ Excel files
34203420
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
34213421
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
34223422
can be read using ``xlrd``. Binary Excel (``.xlsb``)
3423-
files can be read using ``pyxlsb``.
3423+
files can be read using ``pyxlsb``. Also, all this formats can be read using ``python-calamine``,
3424+
but this library has sime limitation, for example, can't detect date in most formats.
34243425
The :meth:`~DataFrame.to_excel` instance method is used for
34253426
saving a ``DataFrame`` to Excel. Generally the semantics are
34263427
similar to working with :ref:`csv<io.read_csv_table>` data.

doc/source/whatsnew/v2.0.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,6 @@ Other enhancements
275275
- Improved error message in :func:`to_datetime` for non-ISO8601 formats, informing users about the position of the first error (:issue:`50361`)
276276
- Improved error message when trying to align :class:`DataFrame` objects (for example, in :func:`DataFrame.compare`) to clarify that "identically labelled" refers to both index and columns (:issue:`50083`)
277277
- Performance improvement in :func:`to_datetime` when format is given or can be inferred (:issue:`50465`)
278-
- Added ``calamine`` as an engine to ``read_excel`` (:issue: ``50395``)
279278
- Added support for :meth:`Index.min` and :meth:`Index.max` for pyarrow string dtypes (:issue:`51397`)
280279
- Added :meth:`DatetimeIndex.as_unit` and :meth:`TimedeltaIndex.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`50616`)
281280
- Added :meth:`Series.dt.unit` and :meth:`Series.dt.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`51223`)

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Other enhancements
3737
- Improve error message when having incompatible columns using :meth:`DataFrame.merge` (:issue:`51861`)
3838
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`)
3939
- :meth:`arrays.SparseArray.map` now supports ``na_action`` (:issue:`52096`).
40+
- Added ``calamine`` as an engine to ``read_excel`` (:issue:`50395`)
4041

4142
.. ---------------------------------------------------------------------------
4243
.. _whatsnew_210.notable_bug_fixes:

pandas/compat/_optional.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"pyarrow": "7.0.0",
3838
"pyreadstat": "1.1.2",
3939
"pytest": "7.0.0",
40-
"python-calamine": "0.0.8",
40+
"python-calamine": "0.1.0",
4141
"pyxlsb": "1.0.8",
4242
"s3fs": "2021.08.0",
4343
"scipy": "1.7.1",

pandas/core/config_init.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -503,11 +503,11 @@ def use_inf_as_na_cb(key) -> None:
503503
auto, {others}.
504504
"""
505505

506-
_xls_options = ["xlrd"]
507-
_xlsm_options = ["xlrd", "openpyxl"]
508-
_xlsx_options = ["xlrd", "openpyxl"]
509-
_ods_options = ["odf"]
510-
_xlsb_options = ["pyxlsb"]
506+
_xls_options = ["xlrd", "calamine"]
507+
_xlsm_options = ["xlrd", "openpyxl", "calamine"]
508+
_xlsx_options = ["xlrd", "openpyxl", "calamine"]
509+
_ods_options = ["odf", "calamine"]
510+
_xlsb_options = ["pyxlsb", "calamine"]
511511

512512

513513
with cf.config_prefix("io.excel.xls"):

pandas/io/excel/_base.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -149,13 +149,15 @@
149149
of dtype conversion.
150150
engine : str, default None
151151
If io is not a buffer or path, this must be set to identify io.
152-
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb".
152+
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb", "calamine".
153153
Engine compatibility :
154154
155155
- "xlrd" supports old-style Excel files (.xls).
156156
- "openpyxl" supports newer Excel file formats.
157157
- "odf" supports OpenDocument file formats (.odf, .ods, .odt).
158158
- "pyxlsb" supports Binary Excel files.
159+
- "calamine" supports Excel (.xls, .xlsx, .xlsm, .xlsb)
160+
and OpenDocument (.ods) file formats.
159161
160162
.. versionchanged:: 1.2.0
161163
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
@@ -375,7 +377,7 @@ def read_excel(
375377
| Callable[[str], bool]
376378
| None = ...,
377379
dtype: DtypeArg | None = ...,
378-
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = ...,
380+
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = ...,
379381
converters: dict[str, Callable] | dict[int, Callable] | None = ...,
380382
true_values: Iterable[Hashable] | None = ...,
381383
false_values: Iterable[Hashable] | None = ...,
@@ -414,7 +416,7 @@ def read_excel(
414416
| Callable[[str], bool]
415417
| None = ...,
416418
dtype: DtypeArg | None = ...,
417-
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = ...,
419+
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = ...,
418420
converters: dict[str, Callable] | dict[int, Callable] | None = ...,
419421
true_values: Iterable[Hashable] | None = ...,
420422
false_values: Iterable[Hashable] | None = ...,
@@ -453,7 +455,7 @@ def read_excel(
453455
| Callable[[str], bool]
454456
| None = None,
455457
dtype: DtypeArg | None = None,
456-
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = None,
458+
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = None,
457459
converters: dict[str, Callable] | dict[int, Callable] | None = None,
458460
true_values: Iterable[Hashable] | None = None,
459461
false_values: Iterable[Hashable] | None = None,
@@ -1418,13 +1420,15 @@ class ExcelFile:
14181420
.xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
14191421
engine : str, default None
14201422
If io is not a buffer or path, this must be set to identify io.
1421-
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``
1423+
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, ``calamine``
14221424
Engine compatibility :
14231425
14241426
- ``xlrd`` supports old-style Excel files (.xls).
14251427
- ``openpyxl`` supports newer Excel file formats.
14261428
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
14271429
- ``pyxlsb`` supports Binary Excel files.
1430+
- ``calamine`` supports Excel (.xls, .xlsx, .xlsm, .xlsb)
1431+
and OpenDocument (.ods) file formats.
14281432
14291433
.. versionchanged:: 1.2.0
14301434

pandas/io/excel/_calaminereader.py

+17-34
Original file line numberDiff line numberDiff line change
@@ -5,36 +5,31 @@
55
datetime,
66
time,
77
)
8-
from tempfile import NamedTemporaryFile
98
from typing import (
9+
TYPE_CHECKING,
1010
Union,
11-
cast,
1211
)
1312

14-
from pandas._typing import (
15-
FilePath,
16-
ReadBuffer,
17-
Scalar,
18-
StorageOptions,
19-
)
2013
from pandas.compat._optional import import_optional_dependency
2114
from pandas.util._decorators import doc
2215

2316
import pandas as pd
2417
from pandas.core.shared_docs import _shared_docs
2518

26-
from pandas.io.common import stringify_path
27-
from pandas.io.excel._base import (
28-
BaseExcelReader,
29-
inspect_excel_format,
30-
)
19+
from pandas.io.excel._base import BaseExcelReader
3120

32-
ValueT = Union[int, float, str, bool, time, date, datetime]
21+
if TYPE_CHECKING:
22+
from pandas._typing import (
23+
FilePath,
24+
ReadBuffer,
25+
Scalar,
26+
StorageOptions,
27+
)
3328

29+
_CellValueT = Union[int, float, str, bool, time, date, datetime]
3430

35-
class CalamineExcelReader(BaseExcelReader):
36-
_sheet_names: list[str] | None = None
3731

32+
class CalamineExcelReader(BaseExcelReader):
3833
@doc(storage_options=_shared_docs["storage_options"])
3934
def __init__(
4035
self,
@@ -55,26 +50,14 @@ def __init__(
5550

5651
@property
5752
def _workbook_class(self):
58-
from python_calamine import CalamineReader
53+
from python_calamine import CalamineWorkbook
5954

60-
return CalamineReader
55+
return CalamineWorkbook
6156

6257
def load_workbook(self, filepath_or_buffer: FilePath | ReadBuffer[bytes]):
63-
if hasattr(filepath_or_buffer, "read") and hasattr(filepath_or_buffer, "seek"):
64-
filepath_or_buffer = cast(ReadBuffer, filepath_or_buffer)
65-
ext = inspect_excel_format(filepath_or_buffer)
66-
with NamedTemporaryFile(suffix=f".{ext}", delete=False) as tmp_file:
67-
filepath_or_buffer.seek(0)
68-
tmp_file.write(filepath_or_buffer.read())
69-
filepath_or_buffer = tmp_file.name
70-
else:
71-
filepath_or_buffer = stringify_path(filepath_or_buffer)
72-
73-
assert isinstance(filepath_or_buffer, str)
74-
75-
from python_calamine import CalamineReader
58+
from python_calamine import load_workbook
7659

77-
return CalamineReader.from_path(filepath_or_buffer)
60+
return load_workbook(filepath_or_buffer) # type: ignore[arg-type]
7861

7962
@property
8063
def sheet_names(self) -> list[str]:
@@ -91,7 +74,7 @@ def get_sheet_by_index(self, index: int):
9174
def get_sheet_data(
9275
self, sheet, file_rows_needed: int | None = None
9376
) -> list[list[Scalar]]:
94-
def _convert_cell(value: ValueT) -> Scalar:
77+
def _convert_cell(value: _CellValueT) -> Scalar:
9578
if isinstance(value, float):
9679
val = int(value)
9780
if val == value:
@@ -105,7 +88,7 @@ def _convert_cell(value: ValueT) -> Scalar:
10588

10689
return value
10790

108-
rows: list[list[ValueT]] = sheet.to_python(skip_empty_area=False)
91+
rows: list[list[_CellValueT]] = sheet.to_python(skip_empty_area=False)
10992
data: list[list[Scalar]] = []
11093

11194
for row in rows:

pyproject.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ computation = ['scipy>=1.7.1', 'xarray>=0.21.0']
6262
fss = ['fsspec>=2021.07.0']
6363
aws = ['s3fs>=2021.08.0']
6464
gcp = ['gcsfs>=2021.07.0', 'pandas-gbq>=0.15.0']
65-
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
65+
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'python-calamine>=0.1.0', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
6666
parquet = ['pyarrow>=7.0.0']
6767
feather = ['pyarrow>=7.0.0']
6868
hdf5 = [# blosc only available on conda (https://github.com/Blosc/python-blosc/issues/297)
@@ -104,7 +104,7 @@ all = ['beautifulsoup4>=4.9.3',
104104
'pytest>=7.0.0',
105105
'pytest-xdist>=2.2.0',
106106
'pytest-asyncio>=0.17.0',
107-
'python-calamine>=0.0.8',
107+
'python-calamine>=0.1.0',
108108
'python-snappy>=0.6.0',
109109
'pyxlsb>=1.0.8',
110110
'qtpy>=2.2.0',

scripts/tests/data/deps_expected_random.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,6 @@ dependencies:
5555
- xlrd>=2.0.1
5656
- xlsxwriter>=1.4.3
5757
- zstandard>=0.15.2
58+
59+
- pip:
60+
- python-calamine>=0.1.0

scripts/tests/data/deps_minimum.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ computation = ['scipy>=1.7.1', 'xarray>=0.21.0']
6262
fss = ['fsspec>=2021.07.0']
6363
aws = ['s3fs>=2021.08.0']
6464
gcp = ['gcsfs>=2021.07.0', 'pandas-gbq>=0.15.0']
65-
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
65+
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'python-calamine>=0.1.0', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
6666
parquet = ['pyarrow>=7.0.0']
6767
feather = ['pyarrow>=7.0.0']
6868
hdf5 = [# blosc only available on conda (https://github.com/Blosc/python-blosc/issues/297)
@@ -104,6 +104,7 @@ all = ['beautifulsoup4>=5.9.3',
104104
'pytest>=7.0.0',
105105
'pytest-xdist>=2.2.0',
106106
'pytest-asyncio>=0.17.0',
107+
'python-calamine>=0.1.0',
107108
'python-snappy>=0.6.0',
108109
'pyxlsb>=1.0.8',
109110
'qtpy>=2.2.0',

scripts/tests/data/deps_unmodified_random.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,6 @@ dependencies:
5555
- xlrd>=2.0.1
5656
- xlsxwriter>=1.4.3
5757
- zstandard>=0.15.2
58+
59+
- pip:
60+
- python-calamine>=0.1.0

0 commit comments

Comments
 (0)