Skip to content

Commit e63b5d8

Browse files
amzneroJulianWgs
authored andcommitted
BUG: Fix pd.read_orc raising AttributeError (pandas-dev#40970)
1 parent 88fdc75 commit e63b5d8

File tree

5 files changed

+29
-6
lines changed

5 files changed

+29
-6
lines changed

doc/source/getting_started/install.rst

+15
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,21 @@ pyarrow 0.15.0 Parquet, ORC, and feather reading /
362362
pyreadstat SPSS files (.sav) reading
363363
========================= ================== =============================================================
364364

365+
.. _install.warn_orc:
366+
367+
.. warning::
368+
369+
* If you want to use :func:`~pandas.read_orc`, it is highly recommended to install pyarrow using conda.
370+
The following is a summary of the environment in which :func:`~pandas.read_orc` can work.
371+
372+
========================= ================== =============================================================
373+
System Conda PyPI
374+
========================= ================== =============================================================
375+
Linux Successful Failed(pyarrow==3.0 Successful)
376+
macOS Successful Failed
377+
Windows Failed Failed
378+
========================= ================== =============================================================
379+
365380
Access data in the cloud
366381
^^^^^^^^^^^^^^^^^^^^^^^^
367382

doc/source/user_guide/io.rst

+5
Original file line numberDiff line numberDiff line change
@@ -5443,6 +5443,11 @@ Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <https://orc.
54435443
for data frames. It is designed to make reading data frames efficient. pandas provides *only* a reader for the
54445444
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
54455445

5446+
.. warning::
5447+
5448+
* It is *highly recommended* to install pyarrow using conda due to some issues occurred by pyarrow.
5449+
* :func:`~pandas.read_orc` is not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5450+
54465451
.. _io.sql:
54475452

54485453
SQL queries

doc/source/whatsnew/v1.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -791,6 +791,7 @@ I/O
791791
- Bug in :func:`read_sas` raising ``ValueError`` when ``datetimes`` were null (:issue:`39725`)
792792
- Bug in :func:`read_excel` dropping empty values from single-column spreadsheets (:issue:`39808`)
793793
- Bug in :meth:`DataFrame.to_string` misplacing the truncation column when ``index=False`` (:issue:`40907`)
794+
- Bug in :func:`read_orc` always raising ``AttributeError`` (:issue:`40918`)
794795

795796
Period
796797
^^^^^^

pandas/io/orc.py

+8-5
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
""" orc compat """
22
from __future__ import annotations
33

4-
import distutils
54
from typing import TYPE_CHECKING
65

76
from pandas._typing import FilePathOrBuffer
7+
from pandas.compat._optional import import_optional_dependency
88

99
from pandas.io.common import get_handle
1010

@@ -42,13 +42,16 @@ def read_orc(
4242
Returns
4343
-------
4444
DataFrame
45+
46+
Notes
47+
-------
48+
Before using this function you should read the :ref:`user guide about ORC <io.orc>`
49+
and :ref:`install optional dependencies <install.warn_orc>`.
4550
"""
4651
# we require a newer version of pyarrow than we support for parquet
47-
import pyarrow
4852

49-
if distutils.version.LooseVersion(pyarrow.__version__) < "0.13.0":
50-
raise ImportError("pyarrow must be >= 0.13.0 for read_orc")
53+
orc = import_optional_dependency("pyarrow.orc")
5154

5255
with get_handle(path, "rb", is_text=False) as handles:
53-
orc_file = pyarrow.orc.ORCFile(handles.handle)
56+
orc_file = orc.ORCFile(handles.handle)
5457
return orc_file.read(columns=columns, **kwargs).to_pandas()

pandas/tests/io/test_orc.py

-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
from pandas import read_orc
1010
import pandas._testing as tm
1111

12-
pytest.importorskip("pyarrow", minversion="0.13.0")
1312
pytest.importorskip("pyarrow.orc")
1413

1514
pytestmark = pytest.mark.filterwarnings(

0 commit comments

Comments
 (0)