Skip to content

DOC restructure contributing environment guide #50145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 16, 2022
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 50 additions & 87 deletions doc/source/development/contributing_environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,11 @@ locally before pushing your changes. It's recommended to also install the :ref:`
.. contents:: Table of contents:
:local:

Step 1: install a C compiler
----------------------------

Option 1: creating an environment without Docker
------------------------------------------------

Installing a C compiler
~~~~~~~~~~~~~~~~~~~~~~~

pandas uses C extensions (mostly written using Cython) to speed up certain
operations. To install pandas from source, you need to compile these C
extensions, which means you need a C compiler. This process depends on which
platform you're using.

If you have setup your environment using :ref:`mamba <contributing.mamba>`, the packages ``c-compiler``
and ``cxx-compiler`` will install a fitting compiler for your platform that is
compatible with the remaining mamba packages. On Windows and macOS, you will
also need to install the SDKs as they have to be distributed separately.
These packages will automatically be installed by using the ``pandas``
``environment.yml`` file.
How to do this will depend on your platform. If you choose to user ``Docker``
in the next step, then you can skip this step.

**Windows**

Expand All @@ -48,6 +35,9 @@ You will need `Build Tools for Visual Studio 2022
Alternatively, you can install the necessary components on the commandline using
`vs_BuildTools.exe <https://learn.microsoft.com/en-us/visualstudio/install/use-command-line-parameters-to-install-visual-studio?source=recommendations&view=vs-2022>`_

Alternatively, you could use the `WSL <https://learn.microsoft.com/en-us/windows/wsl/install>`_
and consult the ``Linux`` instructions below.

**macOS**

To use the :ref:`mamba <contributing.mamba>`-based compilers, you will need to install the
Expand All @@ -71,67 +61,40 @@ which compilers (and versions) are installed on your system::

`GCC (GNU Compiler Collection) <https://gcc.gnu.org/>`_, is a widely used
compiler, which supports C and a number of other languages. If GCC is listed
as an installed compiler nothing more is required. If no C compiler is
installed (or you wish to install a newer version) you can install a compiler
(GCC in the example code below) with::
as an installed compiler nothing more is required.

# for recent Debian/Ubuntu:
sudo apt install build-essential
# for Red Had/RHEL/CentOS/Fedora
yum groupinstall "Development Tools"

For other Linux distributions, consult your favorite search engine for
compiler installation instructions.
If no C compiler is installed, or you wish to upgrade, or you're using a different
Linux distribution, consult your favorite search engine for compiler installation/update
instructions.

Let us know if you have any difficulties by opening an issue or reaching out on our contributor
community :ref:`Slack <community.slack>`.

.. _contributing.mamba:

Option 1a: using mamba (recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 2: create an isolated environment
----------------------------------------

Now create an isolated pandas development environment:
Before we begin, please:

* Install `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
* Make sure your mamba is up to date (``mamba update mamba``)
* Make sure that you have :any:`cloned the repository <contributing.forking>`
* ``cd`` to the pandas source directory

We'll now kick off a three-step process:
.. _contributing.mamba:

Option 1: using mamba (recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Install the build dependencies
2. Build and install pandas
3. Install the optional dependencies
* Install `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
* Make sure your mamba is up to date (``mamba update mamba``)

.. code-block:: none

# Create and activate the build environment
mamba env create --file environment.yml
mamba activate pandas-dev

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

At this point you should be able to import pandas from your locally built version::

$ python
>>> import pandas
>>> print(pandas.__version__) # note: the exact output may differ
1.5.0.dev0+1355.ge65a30e3eb.dirty

This will create the new environment, and not touch any of your existing environments,
nor any existing Python installation.

To return to your root environment::

mamba deactivate

Option 1b: using pip
~~~~~~~~~~~~~~~~~~~~
Option 2: using pip
~~~~~~~~~~~~~~~~~~~

If you aren't using mamba for your development environment, follow these instructions.
You'll need to have at least the :ref:`minimum Python version <install.version>` that pandas supports.
You also need to have ``setuptools`` 51.0.0 or later to build pandas.

Expand All @@ -150,10 +113,6 @@ You also need to have ``setuptools`` 51.0.0 or later to build pandas.
# Install the build dependencies
python -m pip install -r requirements-dev.txt

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

**Unix**/**macOS with pyenv**

Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
Expand All @@ -162,7 +121,6 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.

# Create a virtual environment
# Use an ENV_DIR of your choice. We'll use ~/Users/<yourname>/.pyenv/versions/pandas-dev

pyenv virtualenv <version> <name-to-give-it>

# For instance:
Expand All @@ -174,19 +132,15 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
# Now install the build dependencies in the cloned pandas repo
python -m pip install -r requirements-dev.txt

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

**Windows**

Below is a brief overview on how to set-up a virtual environment with Powershell
under Windows. For details please refer to the
`official virtualenv user guide <https://virtualenv.pypa.io/en/latest/user_guide.html#activators>`__.

Use an ENV_DIR of your choice. We'll use ~\\virtualenvs\\pandas-dev where
'~' is the folder pointed to by either $env:USERPROFILE (Powershell) or
%USERPROFILE% (cmd.exe) environment variable. Any parent directories
Use an ENV_DIR of your choice. We'll use ``~\\virtualenvs\\pandas-dev`` where
``~`` is the folder pointed to by either ``$env:USERPROFILE`` (Powershell) or
``%USERPROFILE%`` (cmd.exe) environment variable. Any parent directories
should already exist.

.. code-block:: powershell
Expand All @@ -200,16 +154,10 @@ should already exist.
# Install the build dependencies
python -m pip install -r requirements-dev.txt

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

Option 2: creating an environment using Docker
----------------------------------------------
Option 3: using Docker
~~~~~~~~~~~~~~~~~~~~~~

Instead of manually setting up a development environment, you can use `Docker
<https://docs.docker.com/get-docker/>`_ to automatically create the environment with just several
commands. pandas provides a ``DockerFile`` in the root directory to build a Docker image
pandas provides a ``DockerFile`` in the root directory to build a Docker image
with a full pandas development environment.

**Docker Commands**
Expand All @@ -226,13 +174,6 @@ Run Container::
# but if not alter ${PWD} to match your local repo path
docker run -it --rm -v ${PWD}:/home/pandas pandas-dev

When inside the running container you can build and install pandas the same way as the other methods

.. code-block:: bash

python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

*Even easier, you can integrate Docker with the following IDEs:*

**Visual Studio Code**
Expand All @@ -246,3 +187,25 @@ See https://code.visualstudio.com/docs/remote/containers for details.
Enable Docker support and use the Services tool window to build and manage images as well as
run and interact with containers.
See https://www.jetbrains.com/help/pycharm/docker.html for details.

Step 3: build and install pandas
--------------------------------

You can now run::

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

At this point you should be able to import pandas from your locally built version::

$ python
>>> import pandas
>>> print(pandas.__version__) # note: the exact output may differ
2.0.0.dev0+880.g2b9e661fbb.dirty

This will create the new environment, and not touch any of your existing environments,
nor any existing Python installation.

Note that you will need to repeat this step each time the C extensions change, for example
if you modified them or if you did a fetch and merge from ``upstream/main``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if you modified them or if you did a fetch and merge from ``upstream/main``.
if you modified any file in `pandas/_libs` or if you did a fetch and merge from ``upstream/main``.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also some in pandas/io though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't know there's some sas cython code there (I think that should be moved to pandas/_libs at some point).

Mainly wanted to clarify here what files constitute C extensions (pyx files)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've gone with your suggestion (it does say "for example", so it's fine if it's not exhaustive, especially if the sas files should be moved to pandas/_libs too)