From 5807e5b043b38512ace7131b906a87095d895abb Mon Sep 17 00:00:00 2001 From: Matthew Roeschke Date: Sat, 29 Jan 2022 16:09:36 -0800 Subject: [PATCH 1/3] DOC: Improve pull request template and contributing guide --- .github/PULL_REQUEST_TEMPLATE.md | 8 +- .../development/contributing_codebase.rst | 206 ++---------------- 2 files changed, 25 insertions(+), 189 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 42017db8a05b1..a75a613ab2bf1 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,4 +1,4 @@ -- [ ] closes #xxxx -- [ ] tests added / passed -- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit) for how to run them -- [ ] whatsnew entry +- [ ] closes #xxxx (Replace xxxx with the Github issue number) +- [ ] [Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature +- [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit). +- [ ] Added an entry in the latest `doc/source/whatsnew/vX.X.X.rst` file if fixing a bug or adding a new feature. diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst index 4826921d4866b..66e50ab195af0 100644 --- a/doc/source/development/contributing_codebase.rst +++ b/doc/source/development/contributing_codebase.rst @@ -44,9 +44,10 @@ Additional standards are outlined on the :ref:`pandas code style guide `_ instead -to automatically run ``black``, ``flake8``, ``isort`` when you make a git commit. This +Additionally, :ref:`Continuous Integration ` will run code formatting checks +like ``black``, ``flake8``, ``isort``, and ``cpplint`` and more using `pre-commit hooks `_ +Any warnings from these checks will cause the :ref:`Continuous Integration ` to fail; therefore, +it is helpful to run the check yourself before submitting code.This can be done by installing ``pre-commit``:: pip install pre-commit @@ -99,157 +100,8 @@ All optional dependencies should be documented in :ref:`install.optional_dependencies` and the minimum required version should be set in the ``pandas.compat._optional.VERSIONS`` dict. -C (cpplint) -~~~~~~~~~~~ - -pandas uses the `Google `_ -standard. Google provides an open source style checker called ``cpplint``, but we -use a fork of it that can be found `here `__. -Here are *some* of the more common ``cpplint`` issues: - -* we restrict line-length to 80 characters to promote readability -* every header file must include a header guard to avoid name collisions if re-included - -:ref:`Continuous Integration ` will run the -`cpplint `_ tool -and report any stylistic errors in your code. Therefore, it is helpful before -submitting code to run the check yourself:: - - cpplint --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir modified-c-file - -You can also run this command on an entire directory if necessary:: - - cpplint --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive modified-c-directory - -To make your commits compliant with this standard, you can install the -`ClangFormat `_ tool, which can be -downloaded `here `__. To configure, in your home directory, -run the following command:: - - clang-format style=google -dump-config > .clang-format - -Then modify the file to ensure that any indentation width parameters are at least four. -Once configured, you can run the tool as follows:: - - clang-format modified-c-file - -This will output what your file will look like if the changes are made, and to apply -them, run the following command:: - - clang-format -i modified-c-file - -To run the tool on an entire directory, you can run the following analogous commands:: - - clang-format modified-c-directory/*.c modified-c-directory/*.h - clang-format -i modified-c-directory/*.c modified-c-directory/*.h - -Do note that this tool is best-effort, meaning that it will try to correct as -many errors as possible, but it may not correct *all* of them. Thus, it is -recommended that you run ``cpplint`` to double check and make any other style -fixes manually. - -.. _contributing.code-formatting: - -Python (PEP8 / black) -~~~~~~~~~~~~~~~~~~~~~ - -pandas follows the `PEP8 `_ standard -and uses `Black `_ and -`Flake8 `_ to ensure a consistent code -format throughout the project. We encourage you to use :ref:`pre-commit `. - -:ref:`Continuous Integration ` will run those tools and -report any stylistic errors in your code. Therefore, it is helpful before -submitting code to run the check yourself:: - - black pandas - git diff upstream/main -u -- "*.py" | flake8 --diff - -to auto-format your code. Additionally, many editors have plugins that will -apply ``black`` as you edit files. - -You should use a ``black`` version 21.5b2 as previous versions are not compatible -with the pandas codebase. - -One caveat about ``git diff upstream/main -u -- "*.py" | flake8 --diff``: this -command will catch any stylistic errors in your changes specifically, but -be beware it may not catch all of them. For example, if you delete the only -usage of an imported function, it is stylistically incorrect to import an -unused function. However, style-checking the diff will not catch this because -the actual import is not part of the diff. Thus, for completeness, you should -run this command, though it may take longer:: - - git diff upstream/main --name-only -- "*.py" | xargs -r flake8 - -Note that on macOS, the ``-r`` flag is not available, so you have to omit it and -run this slightly modified command:: - - git diff upstream/main --name-only -- "*.py" | xargs flake8 - -Windows does not support the ``xargs`` command (unless installed for example -via the `MinGW `__ toolchain), but one can imitate the -behaviour as follows:: - - for /f %i in ('git diff upstream/main --name-only -- "*.py"') do flake8 %i - -This will get all the files being changed by the PR (and ending with ``.py``), -and run ``flake8`` on them, one after the other. - -Note that these commands can be run analogously with ``black``. - -.. _contributing.import-formatting: - -Import formatting -~~~~~~~~~~~~~~~~~ -pandas uses `isort `__ to standardise import -formatting across the codebase. - -A guide to import layout as per pep8 can be found `here `__. - -A summary of our current import sections ( in order ): - -* Future -* Python Standard Library -* Third Party -* ``pandas._libs``, ``pandas.compat``, ``pandas.util._*``, ``pandas.errors`` (largely not dependent on ``pandas.core``) -* ``pandas.core.dtypes`` (largely not dependent on the rest of ``pandas.core``) -* Rest of ``pandas.core.*`` -* Non-core ``pandas.io``, ``pandas.plotting``, ``pandas.tseries`` -* Local application/library specific imports - -Imports are alphabetically sorted within these sections. - -As part of :ref:`Continuous Integration ` checks we run:: - - isort --check-only pandas - -to check that imports are correctly formatted as per the ``setup.cfg``. - -If you see output like the below in :ref:`Continuous Integration ` checks: - -.. code-block:: shell - - Check import format using isort - ERROR: /home/travis/build/pandas-dev/pandas/pandas/io/pytables.py Imports are incorrectly sorted - Check import format using isort DONE - The command "ci/code_checks.sh" exited with 1 - -You should run:: - - isort pandas/io/pytables.py - -to automatically format imports correctly. This will modify your local copy of the files. - -Alternatively, you can run a command similar to what was suggested for ``black`` and ``flake8`` :ref:`right above `:: - - git diff upstream/main --name-only -- "*.py" | xargs -r isort - -Where similar caveats apply if you are on macOS or Windows. - -You can then verify the changes look ok, then git :any:`commit ` and :any:`push `. - Backwards compatibility -~~~~~~~~~~~~~~~~~~~~~~~ +----------------------- Please try to maintain backward compatibility. pandas has lots of users with lots of existing code, so don't break it if at all possible. If you think breakage is required, @@ -409,12 +261,7 @@ pandas uses `mypy `_ and `pyright =1.21.0) is required for type validation. @@ -475,14 +322,7 @@ use cases and writing corresponding tests. Adding tests is one of the most common requests after code is pushed to pandas. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue. -Like many packages, pandas uses `pytest -`_ and the convenient -extensions in `numpy.testing -`_. - -.. note:: - - The earliest supported pytest version is 5.0.1. +Like many packages, pandas uses `pytest`_. Writing tests ~~~~~~~~~~~~~ @@ -499,6 +339,7 @@ explicitly construct the result you expect, then compare the actual result to the expected correct result:: def test_pivot(self): + # GH data = { 'index' : ['A', 'B', 'C', 'C', 'B', 'A'], 'columns' : ['One', 'One', 'One', 'Two', 'Two', 'Two'], @@ -516,10 +357,9 @@ the expected correct result:: assert_frame_equal(pivoted, expected) Please remember to add the Github Issue Number as a comment to a new test. -E.g. "# brief comment, see GH#28907" -Transitioning to ``pytest`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Using ``pytest`` +~~~~~~~~~~~~~~~~ pandas existing test structure is *mostly* class-based, meaning that you will typically find tests wrapped in a class. @@ -536,19 +376,16 @@ framework that will facilitate testing and developing. Thus, instead of writing def test_really_cool_feature(): pass -Using ``pytest`` -~~~~~~~~~~~~~~~~ - Here is an example of a self-contained set of tests that illustrate multiple features that we like to use. -* functional style: tests are like ``test_*`` and *only* take arguments that are either fixtures or parameters -* ``pytest.mark`` can be used to set metadata on test functions, e.g. ``skip`` or ``xfail``. -* using ``parametrize``: allow testing of multiple cases -* to set a mark on a parameter, ``pytest.param(..., marks=...)`` syntax should be used -* ``fixture``, code for object construction, on a per-test basis -* using bare ``assert`` for scalars and truth-testing -* ``tm.assert_series_equal`` (and its counter part ``tm.assert_frame_equal``), for pandas object comparisons. -* the typical pattern of constructing an ``expected`` and comparing versus the ``result`` +* Functional tests named ``def test_*`` and *only* take arguments that are either fixtures or parameters. +* Use `@pytest.mark.parameterize `__ when testing multiple cases. +* Use `pytest.mark.xfail `__ when a test case is expected to fail. +* Use `pytest.mark.skip `__ when a test case is never expected to pass. +* Use `pytest.param `__ when a test case needs a particular mark. +* Use `@pytest.fixture `__ if multiple tests can share a setup object. +* Use a bare ``assert`` for testing scalars and truth-testing +* Use ``tm.assert_series_equal(result, expected)`` and ``tm.assert_frame_equal(result, expected)``) for comparing :class:`Series` and :class:`DataFrame` results respectively. We would name this file ``test_cool_feature.py`` and put in an appropriate place in the ``pandas/tests/`` structure. @@ -630,7 +467,7 @@ Tests that we have ``parametrized`` are now accessible via the test name, for ex Using ``hypothesis`` ~~~~~~~~~~~~~~~~~~~~ -Hypothesis is a library for property-based testing. Instead of explicitly +Hypothesis is a library for property-based testing. Instead of explicitly parametrizing a test, you can describe *all* valid inputs and let Hypothesis try to find a failing input. Even better, no matter how many random examples it tries, Hypothesis always reports a single minimal counterexample to your @@ -670,7 +507,7 @@ options or subtle interactions to test (or think of!) all of them. Testing warnings ~~~~~~~~~~~~~~~~ -By default, one of pandas CI workers will fail if any unhandled warnings are emitted. +By default, the :ref:`Continuous Integration ` will fail if any unhandled warnings are emitted. If your change involves checking that a warning is actually emitted, use ``tm.assert_produces_warning(ExpectedWarning)``. @@ -727,8 +564,7 @@ install pandas) by typing:: pytest pandas -The tests suite is exhaustive and takes around 20 minutes to run. Often it is -worth running only a subset of tests first around your changes before running the +Often it is worth running only a subset of tests first around your changes before running the entire suite. The easiest way to do this is with:: From 6177493cbd8244902e34f31fbd19f7666f793c5c Mon Sep 17 00:00:00 2001 From: Matthew Roeschke Date: Sat, 29 Jan 2022 19:15:57 -0800 Subject: [PATCH 2/3] Move some information into contributing --- doc/source/development/code_style.rst | 49 ----------- .../development/contributing_codebase.rst | 81 +++++++++++-------- 2 files changed, 47 insertions(+), 83 deletions(-) diff --git a/doc/source/development/code_style.rst b/doc/source/development/code_style.rst index 7bbfc010fbfb2..b15898c623aec 100644 --- a/doc/source/development/code_style.rst +++ b/doc/source/development/code_style.rst @@ -9,61 +9,12 @@ pandas code style guide .. contents:: Table of contents: :local: -pandas follows the `PEP8 `_ -standard and uses `Black `_ -and `Flake8 `_ to ensure a -consistent code format throughout the project. We encourage you to use -:ref:`pre-commit ` to automatically run ``black``, -``flake8``, ``isort``, and related code checks when you make a git commit. - Patterns ======== We use a ``flake8`` plugin, `pandas-dev-flaker `_, to check our codebase for unwanted patterns. See its ``README`` for the up-to-date list of rules we enforce. -Testing -======= - -Failing tests --------------- - -See https://docs.pytest.org/en/latest/how-to/skipping.html for background. - -Do not use ``pytest.xfail`` ---------------------------- - -Do not use this method. It has the same behavior as ``pytest.skip``, namely -it immediately stops the test and does not check if the test will fail. If -this is the behavior you desire, use ``pytest.skip`` instead. - -Using ``pytest.mark.xfail`` ---------------------------- - -Use this method if a test is known to fail but the manner in which it fails -is not meant to be captured. It is common to use this method for a test that -exhibits buggy behavior or a non-implemented feature. If -the failing test has flaky behavior, use the argument ``strict=False``. This -will make it so pytest does not fail if the test happens to pass. - -Prefer the decorator ``@pytest.mark.xfail`` and the argument ``pytest.param`` -over usage within a test so that the test is appropriately marked during the -collection phase of pytest. For xfailing a test that involves multiple -parameters, a fixture, or a combination of these, it is only possible to -xfail during the testing phase. To do so, use the ``request`` fixture: - -.. code-block:: python - - import pytest - - def test_xfail(request): - mark = pytest.mark.xfail(raises=TypeError, reason="Indicate why here") - request.node.add_marker(mark) - -xfail is not to be used for tests involving failure due to invalid user arguments. -For these tests, we need to verify the correct exception type and error message -is being raised, using ``pytest.raises`` instead. - Miscellaneous ============= diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst index 66e50ab195af0..2ce88ca6fe496 100644 --- a/doc/source/development/contributing_codebase.rst +++ b/doc/source/development/contributing_codebase.rst @@ -322,45 +322,20 @@ use cases and writing corresponding tests. Adding tests is one of the most common requests after code is pushed to pandas. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue. -Like many packages, pandas uses `pytest`_. - Writing tests ~~~~~~~~~~~~~ All tests should go into the ``tests`` subdirectory of the specific package. This folder contains many current examples of tests, and we suggest looking to these for -inspiration. If your test requires working with files or -network connectivity, there is more information on the :wiki:`Testing` of the wiki. - -The ``pandas._testing`` module has many special ``assert`` functions that -make it easier to make statements about whether Series or DataFrame objects are -equivalent. The easiest way to verify that your code is correct is to -explicitly construct the result you expect, then compare the actual result to -the expected correct result:: - - def test_pivot(self): - # GH - data = { - 'index' : ['A', 'B', 'C', 'C', 'B', 'A'], - 'columns' : ['One', 'One', 'One', 'Two', 'Two', 'Two'], - 'values' : [1., 2., 3., 3., 2., 1.] - } - - frame = DataFrame(data) - pivoted = frame.pivot(index='index', columns='columns', values='values') - - expected = DataFrame({ - 'One' : {'A' : 1., 'B' : 2., 'C' : 3.}, - 'Two' : {'A' : 1., 'B' : 2., 'C' : 3.} - }) - - assert_frame_equal(pivoted, expected) - -Please remember to add the Github Issue Number as a comment to a new test. +inspiration. Please reference our :ref:`testing location guide ` if you are unsure +where to place a new unit test. Using ``pytest`` ~~~~~~~~~~~~~~~~ +Test structure +^^^^^^^^^^^^^^ + pandas existing test structure is *mostly* class-based, meaning that you will typically find tests wrapped in a class. .. code-block:: python @@ -376,18 +351,55 @@ framework that will facilitate testing and developing. Thus, instead of writing def test_really_cool_feature(): pass -Here is an example of a self-contained set of tests that illustrate multiple features that we like to use. +Preferred idioms +^^^^^^^^^^^^^^^^ * Functional tests named ``def test_*`` and *only* take arguments that are either fixtures or parameters. +* Use a bare ``assert`` for testing scalars and truth-testing +* Use ``tm.assert_series_equal(result, expected)`` and ``tm.assert_frame_equal(result, expected)``) for comparing :class:`Series` and :class:`DataFrame` results respectively. * Use `@pytest.mark.parameterize `__ when testing multiple cases. * Use `pytest.mark.xfail `__ when a test case is expected to fail. * Use `pytest.mark.skip `__ when a test case is never expected to pass. * Use `pytest.param `__ when a test case needs a particular mark. * Use `@pytest.fixture `__ if multiple tests can share a setup object. -* Use a bare ``assert`` for testing scalars and truth-testing -* Use ``tm.assert_series_equal(result, expected)`` and ``tm.assert_frame_equal(result, expected)``) for comparing :class:`Series` and :class:`DataFrame` results respectively. -We would name this file ``test_cool_feature.py`` and put in an appropriate place in the ``pandas/tests/`` structure. +.. warning:: + + Do not use ``pytest.xfail`` (which is different than ``pytest.mark.xfail``) since it immediately stops the + test and does not check if the test will fail. If this is the behavior you desire, use ``pytest.skip`` instead. + +If a test is known to fail but the manner in which it fails +is not meant to be captured, use ``pytest.mark.xfail`` It is common to use this method for a test that +exhibits buggy behavior or a non-implemented feature. If +the failing test has flaky behavior, use the argument ``strict=False``. This +will make it so pytest does not fail if the test happens to pass. + +Prefer the decorator ``@pytest.mark.xfail`` and the argument ``pytest.param`` +over usage within a test so that the test is appropriately marked during the +collection phase of pytest. For xfailing a test that involves multiple +parameters, a fixture, or a combination of these, it is only possible to +xfail during the testing phase. To do so, use the ``request`` fixture: + +.. code-block:: python + + def test_xfail(request): + mark = pytest.mark.xfail(raises=TypeError, reason="Indicate why here") + request.node.add_marker(mark) + +xfail is not to be used for tests involving failure due to invalid user arguments. +For these tests, we need to verify the correct exception type and error message +is being raised, using ``pytest.raises`` instead. + +If your test requires working with files or +network connectivity, there is more information on the :wiki:`Testing` of the wiki. + + +Example +^^^^^^^ + +Here is an example of a self-contained set of tests in a file ``pandas/tests/test_cool_feature.py`` +that illustrate multiple features that we like to use. Please remember to add the Github Issue Number +as a comment to a new test. .. code-block:: python @@ -420,6 +432,7 @@ We would name this file ``test_cool_feature.py`` and put in an appropriate place def test_series(series, dtype): + # GH result = series.astype(dtype) assert result.dtype == dtype From 0b3e8925f5d87ddbfda9e4f04ded4c8260062169 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke Date: Sat, 29 Jan 2022 20:14:54 -0800 Subject: [PATCH 3/3] Fix some typos --- doc/source/development/contributing_codebase.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst index 2ce88ca6fe496..61e3bcd44bea8 100644 --- a/doc/source/development/contributing_codebase.rst +++ b/doc/source/development/contributing_codebase.rst @@ -47,7 +47,7 @@ Pre-commit Additionally, :ref:`Continuous Integration ` will run code formatting checks like ``black``, ``flake8``, ``isort``, and ``cpplint`` and more using `pre-commit hooks `_ Any warnings from these checks will cause the :ref:`Continuous Integration ` to fail; therefore, -it is helpful to run the check yourself before submitting code.This +it is helpful to run the check yourself before submitting code. This can be done by installing ``pre-commit``:: pip install pre-commit @@ -356,7 +356,7 @@ Preferred idioms * Functional tests named ``def test_*`` and *only* take arguments that are either fixtures or parameters. * Use a bare ``assert`` for testing scalars and truth-testing -* Use ``tm.assert_series_equal(result, expected)`` and ``tm.assert_frame_equal(result, expected)``) for comparing :class:`Series` and :class:`DataFrame` results respectively. +* Use ``tm.assert_series_equal(result, expected)`` and ``tm.assert_frame_equal(result, expected)`` for comparing :class:`Series` and :class:`DataFrame` results respectively. * Use `@pytest.mark.parameterize `__ when testing multiple cases. * Use `pytest.mark.xfail `__ when a test case is expected to fail. * Use `pytest.mark.skip `__ when a test case is never expected to pass.