Skip to content

Commit 6b9318c

Browse files
ingwinlujorisvandenbossche
authored andcommitted
BUG: Deprecate nthreads argument (#23112)
The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves #23053. Resolves #21639.
1 parent 9019582 commit 6b9318c

12 files changed

+63
-68
lines changed

ci/azure-windows-36.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ dependencies:
77
- bottleneck
88
- boost-cpp<1.67
99
- fastparquet
10-
- feather-format
1110
- matplotlib
1211
- numexpr
1312
- numpy=1.14*

ci/requirements-optional-conda.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ beautifulsoup4>=4.2.1
22
blosc
33
bottleneck>=1.2.0
44
fastparquet
5-
feather-format
65
gcsfs
76
html5lib
87
ipython>=5.6.0
@@ -13,7 +12,7 @@ matplotlib>=2.0.0
1312
nbsphinx
1413
numexpr>=2.6.1
1514
openpyxl
16-
pyarrow
15+
pyarrow>=0.4.1
1716
pymysql
1817
pytables>=3.4.2
1918
pytest-cov

ci/requirements-optional-pip.txt

+2-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ beautifulsoup4>=4.2.1
44
blosc
55
bottleneck>=1.2.0
66
fastparquet
7-
feather-format
87
gcsfs
98
html5lib
109
ipython>=5.6.0
@@ -15,7 +14,7 @@ matplotlib>=2.0.0
1514
nbsphinx
1615
numexpr>=2.6.1
1716
openpyxl
18-
pyarrow
17+
pyarrow>=0.4.1
1918
pymysql
2019
tables
2120
pytest-cov
@@ -28,4 +27,4 @@ statsmodels
2827
xarray
2928
xlrd
3029
xlsxwriter
31-
xlwt
30+
xlwt

ci/travis-27.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ dependencies:
77
- bottleneck
88
- cython=0.28.2
99
- fastparquet
10-
- feather-format
1110
- gcsfs
1211
- html5lib
1312
- ipython

ci/travis-36-doc.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ dependencies:
88
- bottleneck
99
- cython>=0.28.2
1010
- fastparquet
11-
- feather-format
1211
- html5lib
1312
- hypothesis>=3.58.0
1413
- ipykernel
@@ -24,6 +23,7 @@ dependencies:
2423
- numpy=1.13*
2524
- openpyxl
2625
- pandoc
26+
- pyarrow
2727
- pyqt
2828
- pytables
2929
- python-dateutil

ci/travis-36.yaml

+1-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ dependencies:
77
- cython>=0.28.2
88
- dask
99
- fastparquet
10-
- feather-format
1110
- flake8>=3.5
1211
- flake8-comprehensions
1312
- gcsfs
@@ -23,7 +22,7 @@ dependencies:
2322
- numpy
2423
- openpyxl
2524
- psycopg2
26-
- pyarrow
25+
- pyarrow=0.9.0
2726
- pymysql
2827
- pytables
2928
- python-snappy

ci/travis-37.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ dependencies:
99
- numpy
1010
- python-dateutil
1111
- nomkl
12+
- pyarrow
1213
- pytz
1314
- pytest
1415
- pytest-xdist

doc/source/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ Optional Dependencies
258258
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions, Version 0.18.1 or higher
259259
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
260260
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage, Version 3.4.2 or higher
261-
* `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage, version 0.3.1 or higher.
261+
* `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.4.1): necessary for feather-based storage.
262262
* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.4.1) or `fastparquet <https://fastparquet.readthedocs.io/en/latest>`__ (>= 0.0.6) for parquet-based storage. The `snappy <https://pypi.org/project/python-snappy>`__ and `brotli <https://pypi.org/project/brotlipy>`__ are available for compression support.
263263
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
264264

doc/source/whatsnew/v0.24.0.txt

+5
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,9 @@ If installed, we now require:
269269
| scipy | 0.18.1 | |
270270
+-----------------+-----------------+----------+
271271

272+
Additionally we no longer depend on `feather-format` for feather based storage
273+
and replaced it with references to `pyarrow` (:issue:`21639` and :issue:`23053`).
274+
272275
.. _whatsnew_0240.api_breaking.csv_line_terminator:
273276

274277
`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
@@ -954,6 +957,8 @@ Deprecations
954957
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
955958
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
956959
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
960+
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
961+
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
957962

958963
.. _whatsnew_0240.deprecations.datetimelike_int_ops:
959964

pandas/io/feather_format.py

+28-22
Original file line numberDiff line numberDiff line change
@@ -3,38 +3,35 @@
33
from distutils.version import LooseVersion
44

55
from pandas.compat import range
6+
from pandas.util._decorators import deprecate_kwarg
67

78
from pandas import DataFrame, Int64Index, RangeIndex
89

910
from pandas.io.common import _stringify_path
1011

1112

1213
def _try_import():
13-
# since pandas is a dependency of feather
14+
# since pandas is a dependency of pyarrow
1415
# we need to import on first use
15-
1616
try:
17-
import feather
17+
import pyarrow
18+
from pyarrow import feather
1819
except ImportError:
19-
2020
# give a nice error message
21-
raise ImportError("the feather-format library is not installed\n"
21+
raise ImportError("pyarrow is not installed\n\n"
2222
"you can install via conda\n"
23-
"conda install feather-format -c conda-forge\n"
23+
"conda install pyarrow -c conda-forge\n"
2424
"or via pip\n"
25-
"pip install -U feather-format\n")
25+
"pip install -U pyarrow\n")
2626

27-
try:
28-
LooseVersion(feather.__version__) >= LooseVersion('0.3.1')
29-
except AttributeError:
30-
raise ImportError("the feather-format library must be >= "
31-
"version 0.3.1\n"
27+
if LooseVersion(pyarrow.__version__) < LooseVersion('0.4.1'):
28+
raise ImportError("pyarrow >= 0.4.1 required for feather support\n\n"
3229
"you can install via conda\n"
33-
"conda install feather-format -c conda-forge"
30+
"conda install pyarrow -c conda-forge"
3431
"or via pip\n"
35-
"pip install -U feather-format\n")
32+
"pip install -U pyarrow\n")
3633

37-
return feather
34+
return feather, pyarrow
3835

3936

4037
def to_feather(df, path):
@@ -51,7 +48,7 @@ def to_feather(df, path):
5148
if not isinstance(df, DataFrame):
5249
raise ValueError("feather only support IO with DataFrames")
5350

54-
feather = _try_import()
51+
feather = _try_import()[0]
5552
valid_types = {'string', 'unicode'}
5653

5754
# validate index
@@ -83,10 +80,11 @@ def to_feather(df, path):
8380
if df.columns.inferred_type not in valid_types:
8481
raise ValueError("feather must have string column names")
8582

86-
feather.write_dataframe(df, path)
83+
feather.write_feather(df, path)
8784

8885

89-
def read_feather(path, nthreads=1):
86+
@deprecate_kwarg(old_arg_name='nthreads', new_arg_name='use_threads')
87+
def read_feather(path, use_threads=True):
9088
"""
9189
Load a feather-format object from the file path
9290
@@ -99,17 +97,25 @@ def read_feather(path, nthreads=1):
9997
Number of CPU threads to use when reading to pandas.DataFrame
10098
10199
.. versionadded 0.21.0
100+
.. deprecated 0.24.0
101+
use_threads: bool, default True
102+
Whether to parallelize reading using multiple threads
103+
104+
.. versionadded 0.24.0
102105
103106
Returns
104107
-------
105108
type of object stored in file
106109
107110
"""
108111

109-
feather = _try_import()
112+
feather, pyarrow = _try_import()
110113
path = _stringify_path(path)
111114

112-
if LooseVersion(feather.__version__) < LooseVersion('0.4.0'):
113-
return feather.read_dataframe(path)
115+
if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'):
116+
int_use_threads = int(use_threads)
117+
if int_use_threads < 1:
118+
int_use_threads = 1
119+
return feather.read_feather(path, nthreads=int_use_threads)
114120

115-
return feather.read_dataframe(path, nthreads=nthreads)
121+
return feather.read_feather(path, use_threads=bool(use_threads))

pandas/tests/io/test_common.py

+2-7
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,7 @@ def test_iterator(self):
135135
(pd.read_csv, 'os', FileNotFoundError, 'csv'),
136136
(pd.read_fwf, 'os', FileNotFoundError, 'txt'),
137137
(pd.read_excel, 'xlrd', FileNotFoundError, 'xlsx'),
138-
pytest.param(
139-
pd.read_feather, 'feather', Exception, 'feather',
140-
marks=pytest.mark.xfail(reason="failing for pyarrow < 0.11.0")),
138+
(pd.read_feather, 'feather', Exception, 'feather'),
141139
(pd.read_hdf, 'tables', FileNotFoundError, 'h5'),
142140
(pd.read_stata, 'os', FileNotFoundError, 'dta'),
143141
(pd.read_sas, 'os', FileNotFoundError, 'sas7bdat'),
@@ -162,10 +160,7 @@ def test_read_non_existant_read_table(self):
162160
(pd.read_csv, 'os', ('io', 'data', 'iris.csv')),
163161
(pd.read_fwf, 'os', ('io', 'data', 'fixed_width_format.txt')),
164162
(pd.read_excel, 'xlrd', ('io', 'data', 'test1.xlsx')),
165-
pytest.param(
166-
pd.read_feather, 'feather',
167-
('io', 'data', 'feather-0_3_1.feather'),
168-
marks=pytest.mark.xfail(reason="failing for pyarrow < 0.11.0")),
163+
(pd.read_feather, 'feather', ('io', 'data', 'feather-0_3_1.feather')),
169164
(pd.read_hdf, 'tables', ('io', 'data', 'legacy_hdf',
170165
'datetimetz_object.h5')),
171166
(pd.read_stata, 'os', ('io', 'data', 'stata10_115.dta')),

pandas/tests/io/test_feather.py

+21-28
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
""" test feather-format compat """
22
from distutils.version import LooseVersion
3-
from warnings import catch_warnings
43

54
import numpy as np
65

@@ -9,15 +8,13 @@
98
from pandas.util.testing import assert_frame_equal, ensure_clean
109

1110
import pytest
12-
feather = pytest.importorskip('feather')
13-
from feather import FeatherError # noqa:E402
11+
pyarrow = pytest.importorskip('pyarrow')
1412

1513
from pandas.io.feather_format import to_feather, read_feather # noqa:E402
1614

17-
fv = LooseVersion(feather.__version__)
15+
pyarrow_version = LooseVersion(pyarrow.__version__)
1816

1917

20-
@pytest.mark.xfail(reason="failing for pyarrow < 0.11.0")
2118
@pytest.mark.single
2219
class TestFeather(object):
2320

@@ -34,8 +31,7 @@ def check_round_trip(self, df, **kwargs):
3431
with ensure_clean() as path:
3532
to_feather(df, path)
3633

37-
with catch_warnings(record=True):
38-
result = read_feather(path, **kwargs)
34+
result = read_feather(path, **kwargs)
3935
assert_frame_equal(result, df)
4036

4137
def test_error(self):
@@ -65,13 +61,6 @@ def test_basic(self):
6561
assert df.dttz.dtype.tz.zone == 'US/Eastern'
6662
self.check_round_trip(df)
6763

68-
@pytest.mark.skipif(fv >= LooseVersion('0.4.0'), reason='fixed in 0.4.0')
69-
def test_strided_data_issues(self):
70-
71-
# strided data issuehttps://github.com/wesm/feather/issues/97
72-
df = pd.DataFrame(np.arange(12).reshape(4, 3), columns=list('abc'))
73-
self.check_error_on_write(df, FeatherError)
74-
7564
def test_duplicate_columns(self):
7665

7766
# https://github.com/wesm/feather/issues/53
@@ -85,29 +74,33 @@ def test_stringify_columns(self):
8574
df = pd.DataFrame(np.arange(12).reshape(4, 3)).copy()
8675
self.check_error_on_write(df, ValueError)
8776

88-
@pytest.mark.skipif(fv >= LooseVersion('0.4.0'), reason='fixed in 0.4.0')
89-
def test_unsupported(self):
90-
91-
# timedelta
92-
df = pd.DataFrame({'a': pd.timedelta_range('1 day', periods=3)})
93-
self.check_error_on_write(df, FeatherError)
94-
95-
# non-strings
96-
df = pd.DataFrame({'a': ['a', 1, 2.0]})
97-
self.check_error_on_write(df, ValueError)
98-
9977
def test_unsupported_other(self):
10078

10179
# period
10280
df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)})
10381
# Some versions raise ValueError, others raise ArrowInvalid.
10482
self.check_error_on_write(df, Exception)
10583

106-
@pytest.mark.skipif(fv < LooseVersion('0.4.0'), reason='new in 0.4.0')
10784
def test_rw_nthreads(self):
108-
10985
df = pd.DataFrame({'A': np.arange(100000)})
110-
self.check_round_trip(df, nthreads=2)
86+
expected_warning = (
87+
"the 'nthreads' keyword is deprecated, "
88+
"use 'use_threads' instead"
89+
)
90+
with tm.assert_produces_warning(FutureWarning) as w:
91+
self.check_round_trip(df, nthreads=2)
92+
assert len(w) == 1
93+
assert expected_warning in str(w[0])
94+
95+
with tm.assert_produces_warning(FutureWarning) as w:
96+
self.check_round_trip(df, nthreads=1)
97+
assert len(w) == 1
98+
assert expected_warning in str(w[0])
99+
100+
def test_rw_use_threads(self):
101+
df = pd.DataFrame({'A': np.arange(100000)})
102+
self.check_round_trip(df, use_threads=True)
103+
self.check_round_trip(df, use_threads=False)
111104

112105
def test_write_with_index(self):
113106

0 commit comments

Comments
 (0)