Skip to content

Commit e6d27ec

Browse files
TomAugspurgerWillAyd
authored andcommitted
REF: Consistent optional dependency handling (#26802)
1 parent 429078b commit e6d27ec

21 files changed

+367
-347
lines changed

doc/source/development/contributing.rst

+15
Original file line numberDiff line numberDiff line change
@@ -499,6 +499,21 @@ as possible to avoid mass breakages.
499499
Additional standards are outlined on the `code style wiki
500500
page <https://github.com/pandas-dev/pandas/wiki/Code-Style-and-Conventions>`_.
501501

502+
Optional dependencies
503+
---------------------
504+
505+
Optional dependencies (e.g. matplotlib) should be imported with the private helper
506+
``pandas.compat._optional.import_optional_dependency``. This ensures a
507+
consistent error message when the dependency is not met.
508+
509+
All methods using an optional dependency should include a test asserting that an
510+
``ImportError`` is raised when the optional dependency is not found. This test
511+
should be skipped if the library is present.
512+
513+
All optional dependencies should be documented in
514+
:ref:`install.optional_dependencies` and the minimum required version should be
515+
set in the ``pandas.compat._optional.VERSIONS`` dict.
516+
502517
C (cpplint)
503518
~~~~~~~~~~~
504519

doc/source/install.rst

+62-80
Original file line numberDiff line numberDiff line change
@@ -252,87 +252,69 @@ Recommended Dependencies
252252
Optional Dependencies
253253
~~~~~~~~~~~~~~~~~~~~~
254254

255-
* `Cython <http://www.cython.org>`__: Only necessary to build development
256-
version. Version 0.28.2 or higher.
257-
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions, Version 0.19.0 or higher
258-
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims. Version 0.8.2 or higher is recommended.
259-
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage, Version 3.4.2 or higher
260-
* `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.9.0): necessary for feather-based storage.
261-
* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.9.0) or `fastparquet <https://fastparquet.readthedocs.io/en/latest>`__ (>= 0.2.1) for parquet-based storage. The `snappy <https://pypi.org/project/python-snappy>`__ and `brotli <https://pypi.org/project/brotlipy>`__ are available for compression support.
262-
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 1.1.4 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
263-
264-
* `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL
265-
* `pymysql <https://github.com/PyMySQL/PyMySQL>`__: for MySQL.
266-
* `SQLite <https://docs.python.org/3/library/sqlite3.html>`__: for SQLite, this is included in Python's standard library by default.
267-
268-
* `matplotlib <http://matplotlib.org/>`__: for plotting, Version 2.2.2 or higher.
269-
* For Excel I/O:
270-
271-
* `xlrd/xlwt <http://www.python-excel.org/>`__: Excel reading (xlrd), version 1.0.0 or higher required, and writing (xlwt)
272-
* `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`__: openpyxl version 2.4.0
273-
for writing .xlsx files (xlrd >= 1.0.0)
274-
* `XlsxWriter <https://pypi.org/project/XlsxWriter>`__: Alternative Excel writer
275-
276-
* `Jinja2 <http://jinja.pocoo.org/>`__: Template engine for conditional HTML formatting.
277-
* `s3fs <http://s3fs.readthedocs.io/>`__: necessary for Amazon S3 access (s3fs >= 0.0.8).
278-
* `blosc <https://pypi.org/project/blosc>`__: for msgpack compression using ``blosc``
279-
* `gcsfs <http://gcsfs.readthedocs.io/>`__: necessary for Google Cloud Storage access (gcsfs >= 0.1.0).
280-
* One of
281-
`qtpy <https://github.com/spyder-ide/qtpy>`__ (requires PyQt or PySide),
282-
`PyQt5 <https://www.riverbankcomputing.com/software/pyqt/download5>`__,
283-
`PyQt4 <http://www.riverbankcomputing.com/software/pyqt/download>`__,
284-
`xsel <http://www.vergenet.net/~conrad/software/xsel/>`__, or
285-
`xclip <https://github.com/astrand/xclip/>`__: necessary to use
286-
:func:`~pandas.read_clipboard`. Most package managers on Linux distributions will have ``xclip`` and/or ``xsel`` immediately available for installation.
287-
* `pandas-gbq
288-
<https://pandas-gbq.readthedocs.io/en/latest/install.html#dependencies>`__:
289-
for Google BigQuery I/O. (pandas-gbq >= 0.8.0)
290-
291-
* One of the following combinations of libraries is needed to use the
292-
top-level :func:`~pandas.read_html` function:
293-
294-
.. versionchanged:: 0.23.0
295-
296-
.. note::
297-
298-
If using BeautifulSoup4 a minimum version of 4.4.1 is required
299-
300-
* `BeautifulSoup4`_ and `html5lib`_ (Any recent version of `html5lib`_ is
301-
okay.)
302-
* `BeautifulSoup4`_ and `lxml`_
303-
* `BeautifulSoup4`_ and `html5lib`_ and `lxml`_
304-
* Only `lxml`_, although see :ref:`HTML Table Parsing <io.html.gotchas>`
305-
for reasons as to why you should probably **not** take this approach.
306-
307-
.. warning::
308-
309-
* if you install `BeautifulSoup4`_ you must install either
310-
`lxml`_ or `html5lib`_ or both.
311-
:func:`~pandas.read_html` will **not** work with *only*
312-
`BeautifulSoup4`_ installed.
313-
* You are highly encouraged to read :ref:`HTML Table Parsing gotchas <io.html.gotchas>`.
314-
It explains issues surrounding the installation and
315-
usage of the above three libraries.
316-
317-
.. note::
318-
319-
* if you're on a system with ``apt-get`` you can do
320-
321-
.. code-block:: sh
322-
323-
sudo apt-get build-dep python-lxml
324-
325-
to get the necessary dependencies for installation of `lxml`_. This
326-
will prevent further headaches down the line.
327-
255+
Pandas has many optional dependencies that are only used for specific methods.
256+
For example, :func:`pandas.read_hdf` requires the ``pytables`` package. If the
257+
optional dependency is not installed, pandas will raise an ``ImportError`` when
258+
the method requiring that dependency is called.
259+
260+
========================= ================== =============================================================
261+
Dependency Minimum Version Notes
262+
========================= ================== =============================================================
263+
BeautifulSoup4 4.4.1 HTML parser for read_html (see :ref:`note <optional_html>`)
264+
Jinja2 Conditional formatting with DataFrame.style
265+
PyQt4 Clipboard I/O
266+
PyQt5 Clipboard I/O
267+
PyTables 3.4.2 HDF5-based reading / writing
268+
SQLAlchemy 1.1.4 SQL support for databases other than sqlite
269+
SciPy 0.19.0 Miscellaneous statistical functions
270+
XLsxWriter Excel writing
271+
blosc Compression for msgpack
272+
fastparquet 0.2.1 Parquet reading / writing
273+
gcsfs 0.1.0 Google Cloud Storage access
274+
html5lib HTML parser for read_html (see :ref:`note <optional_html>`)
275+
lxml HTML parser for read_html (see :ref:`note <optional_html>`)
276+
matplotlib 2.2.2 Visualization
277+
openpyxl 2.4.0 Reading / writing for xlsx files
278+
pandas-gbq 0.8.0 Google Big Query access
279+
psycopg2 PostgreSQL engine for sqlalchemy
280+
pyarrow 0.9.0 Parquet and feather reading / writing
281+
pymysql MySQL engine for sqlalchemy
282+
qtpy Clipboard I/O
283+
s3fs 0.0.8 Amazon S3 access
284+
xarray 0.8.2 pandas-like API for N-dimensional data
285+
xclip Clipboard I/O on linux
286+
xlrd 1.0.0 Excel reading
287+
xlwt 2.4.0 Excel writing
288+
xsel Clipboard I/O on linux
289+
zlib Compression for msgpack
290+
========================= ================== =============================================================
291+
292+
.. _optional_html:
293+
294+
Optional Dependencies for Parsing HTML
295+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
296+
297+
One of the following combinations of libraries is needed to use the
298+
top-level :func:`~pandas.read_html` function:
299+
300+
.. versionchanged:: 0.23.0
301+
302+
* `BeautifulSoup4`_ and `html5lib`_
303+
* `BeautifulSoup4`_ and `lxml`_
304+
* `BeautifulSoup4`_ and `html5lib`_ and `lxml`_
305+
* Only `lxml`_, although see :ref:`HTML Table Parsing <io.html.gotchas>`
306+
for reasons as to why you should probably **not** take this approach.
307+
308+
.. warning::
309+
310+
* if you install `BeautifulSoup4`_ you must install either
311+
`lxml`_ or `html5lib`_ or both.
312+
:func:`~pandas.read_html` will **not** work with *only*
313+
`BeautifulSoup4`_ installed.
314+
* You are highly encouraged to read :ref:`HTML Table Parsing gotchas <io.html.gotchas>`.
315+
It explains issues surrounding the installation and
316+
usage of the above three libraries.
328317

329318
.. _html5lib: https://github.com/html5lib/html5lib-python
330319
.. _BeautifulSoup4: http://www.crummy.com/software/BeautifulSoup
331320
.. _lxml: http://lxml.de
332-
333-
.. note::
334-
335-
Without the optional dependencies, many useful features will not
336-
work. Hence, it is highly recommended that you install these. A packaged
337-
distribution like `Anaconda <http://docs.continuum.io/anaconda/>`__, `ActivePython <https://www.activestate.com/activepython/downloads>`__ (version 2.7 or 3.5), or `Enthought Canopy
338-
<http://enthought.com/products/canopy>`__ may be worth considering.

pandas/compat/_optional.py

+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
import distutils.version
2+
import importlib
3+
import types
4+
from typing import Optional
5+
import warnings
6+
7+
# Update install.rst when updating versions!
8+
9+
VERSIONS = {
10+
"bs4": "4.4.1",
11+
"bottleneck": "1.2.1",
12+
"fastparquet": "0.2.1",
13+
"gcsfs": "0.1.0",
14+
"matplotlib": "2.2.2",
15+
"numexpr": "2.6.2",
16+
"openpyxl": "2.4.0",
17+
"pandas_gbq": "0.8.0",
18+
"pyarrow": "0.9.0",
19+
"pytables": "3.4.2",
20+
"s3fs": "0.0.8",
21+
"scipy": "0.19.0",
22+
"sqlalchemy": "1.1.4",
23+
"xarray": "0.8.2",
24+
"xlrd": "1.0.0",
25+
"xlwt": "2.4.0",
26+
}
27+
28+
message = (
29+
"Missing optional dependency '{name}'. {extra} "
30+
"Use pip or conda to install {name}."
31+
)
32+
version_message = (
33+
"Pandas requires version '{minimum_version}' or newer of '{name}' "
34+
"(version '{actual_version}' currently installed)."
35+
)
36+
37+
38+
def _get_version(module: types.ModuleType) -> str:
39+
version = getattr(module, '__version__', None)
40+
if version is None:
41+
# xlrd uses a capitalized attribute name
42+
version = getattr(module, '__VERSION__', None)
43+
44+
if version is None:
45+
raise ImportError(
46+
"Can't determine version for {}".format(module.__name__)
47+
)
48+
return version
49+
50+
51+
def import_optional_dependency(
52+
name: str,
53+
extra: str = "",
54+
raise_on_missing: bool = True,
55+
on_version: str = "raise",
56+
) -> Optional[types.ModuleType]:
57+
"""
58+
Import an optional dependency.
59+
60+
By default, if a dependency is missing an ImportError with a nice
61+
message will be raised. If a dependency is present, but too old,
62+
we raise.
63+
64+
Parameters
65+
----------
66+
name : str
67+
The module name. This should be top-level only, so that the
68+
version may be checked.
69+
extra : str
70+
Additional text to include in the ImportError message.
71+
raise_on_missing : bool, default True
72+
Whether to raise if the optional dependency is not found.
73+
When False and the module is not present, None is returned.
74+
on_version : str {'raise', 'warn'}
75+
What to do when a dependency's version is too old.
76+
77+
* raise : Raise an ImportError
78+
* warn : Warn that the version is too old. Returns None
79+
* ignore: Return the module, even if the version is too old.
80+
It's expected that users validate the version locally when
81+
using ``on_version="ignore"`` (see. ``io/html.py``)
82+
83+
Returns
84+
-------
85+
maybe_module : Optional[ModuleType]
86+
The imported module, when found and the version is correct.
87+
None is returned when the package is not found and `raise_on_missing`
88+
is False, or when the package's version is too old and `on_version`
89+
is ``'warn'``.
90+
"""
91+
try:
92+
module = importlib.import_module(name)
93+
except ImportError:
94+
if raise_on_missing:
95+
raise ImportError(message.format(name=name, extra=extra)) from None
96+
else:
97+
return None
98+
99+
minimum_version = VERSIONS.get(name)
100+
if minimum_version:
101+
version = _get_version(module)
102+
if distutils.version.LooseVersion(version) < minimum_version:
103+
assert on_version in {"warn", "raise", "ignore"}
104+
msg = version_message.format(
105+
minimum_version=minimum_version,
106+
name=name,
107+
actual_version=version,
108+
)
109+
if on_version == "warn":
110+
warnings.warn(msg, UserWarning)
111+
return None
112+
elif on_version == "raise":
113+
raise ImportError(msg)
114+
115+
return module

pandas/core/arrays/sparse.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from pandas._libs.sparse import BlockIndex, IntIndex, SparseIndex
1616
from pandas._libs.tslibs import NaT
1717
import pandas.compat as compat
18+
from pandas.compat._optional import import_optional_dependency
1819
from pandas.compat.numpy import function as nv
1920
from pandas.errors import PerformanceWarning
2021

@@ -2205,10 +2206,8 @@ def to_coo(self):
22052206
float32. By numpy.find_common_type convention, mixing int64 and
22062207
and uint64 will result in a float64 dtype.
22072208
"""
2208-
try:
2209-
from scipy.sparse import coo_matrix
2210-
except ImportError:
2211-
raise ImportError('Scipy is not installed')
2209+
import_optional_dependency("scipy")
2210+
from scipy.sparse import coo_matrix
22122211

22132212
dtype = find_common_type(self._parent.dtypes)
22142213
if isinstance(dtype, SparseDtype):

pandas/core/computation/check.py

+9-22
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,11 @@
1-
from distutils.version import LooseVersion
2-
import warnings
3-
4-
_NUMEXPR_INSTALLED = False
5-
_MIN_NUMEXPR_VERSION = "2.6.2"
6-
_NUMEXPR_VERSION = None
7-
8-
try:
9-
import numexpr as ne
10-
ver = LooseVersion(ne.__version__)
11-
_NUMEXPR_INSTALLED = ver >= LooseVersion(_MIN_NUMEXPR_VERSION)
12-
_NUMEXPR_VERSION = ver
13-
14-
if not _NUMEXPR_INSTALLED:
15-
warnings.warn(
16-
"The installed version of numexpr {ver} is not supported "
17-
"in pandas and will be not be used\nThe minimum supported "
18-
"version is {min_ver}\n".format(
19-
ver=ver, min_ver=_MIN_NUMEXPR_VERSION), UserWarning)
20-
21-
except ImportError: # pragma: no cover
22-
pass
1+
from pandas.compat._optional import import_optional_dependency
2+
3+
ne = import_optional_dependency("numexpr", raise_on_missing=False,
4+
on_version="warn")
5+
_NUMEXPR_INSTALLED = ne is not None
6+
if _NUMEXPR_INSTALLED:
7+
_NUMEXPR_VERSION = ne.__version__
8+
else:
9+
_NUMEXPR_VERSION = None
2310

2411
__all__ = ['_NUMEXPR_INSTALLED', '_NUMEXPR_VERSION']

pandas/core/generic.py

+2-9
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
from pandas._libs import Timestamp, iNaT, properties
1818
from pandas.compat import set_function_name
19+
from pandas.compat._optional import import_optional_dependency
1920
from pandas.compat.numpy import function as nv
2021
from pandas.errors import AbstractMethodError
2122
from pandas.util._decorators import (
@@ -2750,15 +2751,7 @@ class (index) object 'bird' 'bird' 'mammal' 'mammal'
27502751
Data variables:
27512752
speed (date, animal) int64 350 18 361 15
27522753
"""
2753-
try:
2754-
import xarray
2755-
except ImportError:
2756-
# Give a nice error message
2757-
raise ImportError("the xarray library is not installed\n"
2758-
"you can install via conda\n"
2759-
"conda install xarray\n"
2760-
"or via pip\n"
2761-
"pip install xarray\n")
2754+
xarray = import_optional_dependency("xarray")
27622755

27632756
if self.ndim == 1:
27642757
return xarray.DataArray.from_series(self)

0 commit comments

Comments
 (0)