From c1732612715e389dba2544396a0717c8e8c940d9 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Tue, 22 Nov 2022 18:32:55 -0800 Subject: [PATCH 1/2] DOC/REF: Clarify pip extras dependencies & cleanups --- doc/source/getting_started/install.rst | 122 +++++++++++++------------ doc/source/whatsnew/v2.0.0.rst | 14 +-- 2 files changed, 71 insertions(+), 65 deletions(-) diff --git a/doc/source/getting_started/install.rst b/doc/source/getting_started/install.rst index d4d7ee5efcbb0..2ce88bdcd750f 100644 --- a/doc/source/getting_started/install.rst +++ b/doc/source/getting_started/install.rst @@ -139,6 +139,16 @@ pandas can be installed via pip from pip install pandas +pandas can also be installed with sets of optional dependencies to enable certain functionality. For example, +to install pandas with the optional dependencies to read Excel files. + +:: + + pip install pandas[excel] + + +The full list of extras that can be installed can be found in the :ref:`dependency section.` + Installing with ActivePython ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -232,6 +242,13 @@ This is just an example of what information is shown. You might see a slightly d Dependencies ------------ +.. _install.required_dependencies: + +Required dependencies +~~~~~~~~~~~~~~~~~~~~~ + +pandas requires the following dependencies. + ================================================================ ========================== Package Minimum supported version ================================================================ ========================== @@ -240,56 +257,48 @@ Package Minimum support `pytz `__ 2020.1 ================================================================ ========================== -.. _install.recommended_dependencies: +.. _install.optional_dependencies: -Performance dependencies (recommended) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Optional dependencies +~~~~~~~~~~~~~~~~~~~~~ -pandas recommends the following optional dependencies for performance gains. These dependencies can be specifically -installed with ``pandas[performance]`` (i.e. add as optional_extra to the pandas requirement) +pandas has many optional dependencies that are only used for specific methods. +For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while +:meth:`DataFrame.to_markdown` requires the ``tabulate`` package. If the +optional dependency is not installed, pandas will raise an ``ImportError`` when +the method requiring that dependency is called. -* `numexpr `__: for accelerating certain numerical operations. - ``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups. - If installed, must be Version 2.7.3 or higher. +If using pip, optional pandas dependencies can be installed or managed in a file (e.g. requirements.txt or pyproject.toml) +as optional extras (e.g.,``pandas[performance, aws]>=1.5.0``). All optional dependencies can be installed with ``pandas[all]``, +and specific sets of dependencies are listed in the sections below. -* `bottleneck `__: for accelerating certain types of ``nan`` - evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed, - must be Version 1.3.2 or higher. +.. _install.recommended_dependencies: -* `numba `__: alternative execution engine for operations that accept `engine="numba" - argument (eg. apply). ``numba`` is a JIT compiler that translates Python functions to optimized machine code using - the LLVM compiler library. If installed, must be Version 0.53.1 or higher. +Performance dependencies (recommended) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: You are highly encouraged to install these libraries, as they provide speed improvements, especially when working with large data sets. +Installable with ``pip install pandas[performance]`` -.. _install.optional_dependencies: - -Optional dependencies -~~~~~~~~~~~~~~~~~~~~~ - -pandas has many optional dependencies that are only used for specific methods. -For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while -:meth:`DataFrame.to_markdown` requires the ``tabulate`` package. If the -optional dependency is not installed, pandas will raise an ``ImportError`` when -the method requiring that dependency is called. - -Optional pandas dependencies can be managed as optional extras (e.g.,``pandas[performance, aws]>=1.5.0``) -in a requirements.txt, setup, or pyproject.toml file. -Available optional dependencies are ``[all, performance, computation, aws, -gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, -plot, output_formatting, compression, test]`` +===================================================== ================== ================== =================================================================================================================================================================================== +Dependency Minimum Version pip extra Notes +===================================================== ================== ================== =================================================================================================================================================================================== +`numexpr `__ 2.7.3 performance Accelerates certain numerical operations by using uses multiple cores as well as smart chunking and caching to achieve large speedups +`bottleneck `__ 1.3.2 performance Accelerates certain types of ``nan`` by using specialized cython routines to achieve large speedup. +`numba `__ 0.53.1 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler. +===================================================== ================== ================== =================================================================================================================================================================================== Timezones ^^^^^^^^^ -Can be managed as optional_extra with ``pandas[timezone]``. +Installable with ``pip install pandas[timezone]`` ========================= ========================= =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ========================= =============== ============================================================= tzdata 2022.1(pypi)/ timezone Allows the use of ``zoneinfo`` timezones with pandas. 2022a(for system tzdata) **Note**: You only need to install the pypi package if your @@ -305,10 +314,10 @@ tzdata 2022.1(pypi)/ timezone Allows the u Visualization ^^^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[plot, output_formatting]``, depending on the required functionality. +Installable with ``pip install pandas[plot, output_formatting]``. ========================= ================== ================== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== ================== ============================================================= matplotlib 3.6.1 plot Plotting library Jinja2 3.0.0 output_formatting Conditional formatting with DataFrame.style @@ -318,10 +327,10 @@ tabulate 0.8.9 output_formatting Printing in Mark Computation ^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[computation]``. +Installable with ``pip install pandas[computation]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= SciPy 1.7.1 computation Miscellaneous statistical functions xarray 0.19.0 computation pandas-like API for N-dimensional data @@ -330,10 +339,10 @@ xarray 0.19.0 computation pandas-like API for Excel files ^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[excel]``. +Installable with ``pip install pandas[excel]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= xlrd 2.0.1 excel Reading Excel xlsxwriter 1.4.3 excel Writing Excel @@ -344,10 +353,10 @@ pyxlsb 1.0.8 excel Reading for xlsb fi HTML ^^^^ -These dependencies can be specifically installed with ``pandas[html]``. +Installable with ``pip install pandas[html]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= BeautifulSoup4 4.9.3 html HTML parser for read_html html5lib 1.1 html HTML parser for read_html @@ -381,10 +390,10 @@ top-level :func:`~pandas.read_html` function: XML ^^^ -Can be managed as optional_extra with ``pandas[xml]``. +Installable with ``pip install pandas[xml]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= lxml 4.6.3 xml XML parser for read_xml and tree builder for to_xml ========================= ================== =============== ============================================================= @@ -392,11 +401,10 @@ lxml 4.6.3 xml XML parser for read SQL databases ^^^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[postgresql, mysql, sql-other]``, -depending on required sql compatibility. +Installable with ``pip install pandas[postgresql, mysql, sql-other]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= SQLAlchemy 1.4.16 postgresql, SQL support for databases other than sqlite mysql, @@ -408,11 +416,10 @@ pymysql 1.0.2 mysql MySQL engine for sq Other data sources ^^^^^^^^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[hdf5, parquet, feather, spss, excel]``, -depending on required compatibility. +Installable with ``pip install pandas[hdf5, parquet, feather, spss, excel]`` ========================= ================== ================ ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== ================ ============================================================= PyTables 3.6.1 hdf5 HDF5-based reading / writing blosc 1.21.0 hdf5 Compression for HDF5; only available on ``conda`` @@ -441,10 +448,10 @@ odfpy 1.4.1 excel Open document form Access data in the cloud ^^^^^^^^^^^^^^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[fss, aws, gcp]``, depending on required compatibility. +Installable with ``pip install pandas[fss, aws, gcp]`` ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= fsspec 2021.7.0 fss, gcp, aws Handling files aside from simple local and HTTP (required dependency of s3fs, gcsfs). @@ -456,29 +463,28 @@ s3fs 2021.08.0 aws Amazon S3 access Clipboard ^^^^^^^^^ -Can be managed as optional_extra with ``pandas[clipboard]``. However, depending on operating system, system-level -packages may need to installed. +Installable with ``pip install pandas[clipboard]``. ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= -PyQt4/PyQt5 5.15.1 Clipboard I/O -qtpy 2.2.0 Clipboard I/O +PyQt4/PyQt5 5.15.1 clipboard Clipboard I/O +qtpy 2.2.0 clipboard Clipboard I/O ========================= ================== =============== ============================================================= .. note:: + Depending on operating system, system-level packages may need to installed. For clipboard to operate on Linux one of the CLI tools ``xclip`` or ``xsel`` must be installed on your system. Compression ^^^^^^^^^^^ -Can be managed as optional_extra with ``pandas[compression]``. -If only one specific compression lib is required, please request it as an independent requirement. +Installable with ``pip install pandas[compression]`` ========================= ================== =============== ============================================================= -Dependency Minimum Version optional_extra Notes +Dependency Minimum Version pip extra Notes ========================= ================== =============== ============================================================= brotli 0.7.0 compression Brotli compression python-snappy 0.6.0 compression Snappy compression diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index b08dec8e8b1ef..c70014e860223 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -14,17 +14,17 @@ including other versions of pandas. Enhancements ~~~~~~~~~~~~ -.. _whatsnew_200.enhancements.optional_dependency_management: +.. _whatsnew_200.enhancements.optional_dependency_management_pip: -Optional dependencies version management -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Optional pandas dependencies can be managed as extras in a requirements/setup file, for example: +Installing optional dependencies with pip extras +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras. -.. code-block:: python +.. code-block:: bash - pandas[performance, aws]>=2.0.0 + pip install pandas[performance, aws]>=2.0.0 -Available optional dependencies (listed in order of appearance at `install guide `_) are +The available extras, found in the :ref:`installation guide`, are ``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`). From 3510b4df312df5045c933426c0152121c947ddf5 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Wed, 23 Nov 2022 11:46:40 -0800 Subject: [PATCH 2/2] quote the install --- doc/source/getting_started/install.rst | 26 +++++++++++++------------- doc/source/whatsnew/v2.0.0.rst | 2 +- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/source/getting_started/install.rst b/doc/source/getting_started/install.rst index 2ce88bdcd750f..8e8f61c1d503f 100644 --- a/doc/source/getting_started/install.rst +++ b/doc/source/getting_started/install.rst @@ -144,7 +144,7 @@ to install pandas with the optional dependencies to read Excel files. :: - pip install pandas[excel] + pip install "pandas[excel]" The full list of extras that can be installed can be found in the :ref:`dependency section.` @@ -282,7 +282,7 @@ Performance dependencies (recommended) You are highly encouraged to install these libraries, as they provide speed improvements, especially when working with large data sets. -Installable with ``pip install pandas[performance]`` +Installable with ``pip install "pandas[performance]"`` ===================================================== ================== ================== =================================================================================================================================================================================== Dependency Minimum Version pip extra Notes @@ -295,7 +295,7 @@ Dependency Minimum Version pip ext Timezones ^^^^^^^^^ -Installable with ``pip install pandas[timezone]`` +Installable with ``pip install "pandas[timezone]"`` ========================= ========================= =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -314,7 +314,7 @@ tzdata 2022.1(pypi)/ timezone Allows the u Visualization ^^^^^^^^^^^^^ -Installable with ``pip install pandas[plot, output_formatting]``. +Installable with ``pip install "pandas[plot, output_formatting]"``. ========================= ================== ================== ============================================================= Dependency Minimum Version pip extra Notes @@ -327,7 +327,7 @@ tabulate 0.8.9 output_formatting Printing in Mark Computation ^^^^^^^^^^^ -Installable with ``pip install pandas[computation]``. +Installable with ``pip install "pandas[computation]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -339,7 +339,7 @@ xarray 0.19.0 computation pandas-like API for Excel files ^^^^^^^^^^^ -Installable with ``pip install pandas[excel]``. +Installable with ``pip install "pandas[excel]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -353,7 +353,7 @@ pyxlsb 1.0.8 excel Reading for xlsb fi HTML ^^^^ -Installable with ``pip install pandas[html]``. +Installable with ``pip install "pandas[html]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -390,7 +390,7 @@ top-level :func:`~pandas.read_html` function: XML ^^^ -Installable with ``pip install pandas[xml]``. +Installable with ``pip install "pandas[xml]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -401,7 +401,7 @@ lxml 4.6.3 xml XML parser for read SQL databases ^^^^^^^^^^^^^ -Installable with ``pip install pandas[postgresql, mysql, sql-other]``. +Installable with ``pip install "pandas[postgresql, mysql, sql-other]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -416,7 +416,7 @@ pymysql 1.0.2 mysql MySQL engine for sq Other data sources ^^^^^^^^^^^^^^^^^^ -Installable with ``pip install pandas[hdf5, parquet, feather, spss, excel]`` +Installable with ``pip install "pandas[hdf5, parquet, feather, spss, excel]"`` ========================= ================== ================ ============================================================= Dependency Minimum Version pip extra Notes @@ -448,7 +448,7 @@ odfpy 1.4.1 excel Open document form Access data in the cloud ^^^^^^^^^^^^^^^^^^^^^^^^ -Installable with ``pip install pandas[fss, aws, gcp]`` +Installable with ``pip install "pandas[fss, aws, gcp]"`` ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -463,7 +463,7 @@ s3fs 2021.08.0 aws Amazon S3 access Clipboard ^^^^^^^^^ -Installable with ``pip install pandas[clipboard]``. +Installable with ``pip install "pandas[clipboard]"``. ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes @@ -481,7 +481,7 @@ qtpy 2.2.0 clipboard Clipboard I/O Compression ^^^^^^^^^^^ -Installable with ``pip install pandas[compression]`` +Installable with ``pip install "pandas[compression]"`` ========================= ================== =============== ============================================================= Dependency Minimum Version pip extra Notes diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index c2424bd67f280..1dbd668836780 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -22,7 +22,7 @@ When installing pandas using pip, sets of optional dependencies can also be inst .. code-block:: bash - pip install pandas[performance, aws]>=2.0.0 + pip install "pandas[performance, aws]>=2.0.0" The available extras, found in the :ref:`installation guide`, are ``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,