Skip to content

Commit 1774eb5

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into pr/jorisvandenbossche/ruff
2 parents 407e962 + 3a0db10 commit 1774eb5

File tree

90 files changed

+2428
-500
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+2428
-500
lines changed

.github/workflows/codeql.yml

+3
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ concurrency:
88
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
99
cancel-in-progress: true
1010

11+
permissions:
12+
contents: read
13+
1114
jobs:
1215
analyze:
1316
runs-on: ubuntu-22.04

.github/workflows/wheels.yml

+3
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ concurrency:
3030
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
3131
cancel-in-progress: true
3232

33+
permissions:
34+
contents: read
35+
3336
jobs:
3437
build_wheels:
3538
name: Build wheel for ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

.pre-commit-config.yaml

+128-6
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ repos:
3838
types_or: [python, rst, markdown]
3939
additional_dependencies: [tomli]
4040
- repo: https://github.com/MarcoGorelli/cython-lint
41-
rev: v0.9.1
41+
rev: v0.10.1
4242
hooks:
4343
- id: cython-lint
4444
- id: double-quote-cython-strings
@@ -71,12 +71,12 @@ repos:
7171
'--filter=-readability/casting,-runtime/int,-build/include_subdir,-readability/fn_size'
7272
]
7373
- repo: https://github.com/pycqa/pylint
74-
rev: v2.15.6
74+
rev: v2.15.9
7575
hooks:
7676
- id: pylint
7777
stages: [manual]
7878
- repo: https://github.com/pycqa/pylint
79-
rev: v2.15.6
79+
rev: v2.15.9
8080
hooks:
8181
- id: pylint
8282
alias: redefined-outer-name
@@ -89,15 +89,14 @@ repos:
8989
|^pandas/util/_test_decorators\.py # keep excluded
9090
|^pandas/_version\.py # keep excluded
9191
|^pandas/conftest\.py # keep excluded
92-
|^pandas/core/generic\.py
9392
args: [--disable=all, --enable=redefined-outer-name]
9493
stages: [manual]
9594
- repo: https://github.com/PyCQA/isort
96-
rev: 5.10.1
95+
rev: 5.11.4
9796
hooks:
9897
- id: isort
9998
- repo: https://github.com/asottile/pyupgrade
100-
rev: v3.2.2
99+
rev: v3.3.1
101100
hooks:
102101
- id: pyupgrade
103102
args: [--py38-plus]
@@ -172,6 +171,21 @@ repos:
172171
types: [rst]
173172
args: [--filename=*.rst]
174173
additional_dependencies: [flake8-rst==0.7.0, flake8==3.7.9]
174+
- id: inconsistent-namespace-usage
175+
name: 'Check for inconsistent use of pandas namespace'
176+
entry: python scripts/check_for_inconsistent_pandas_namespace.py
177+
exclude: ^pandas/core/interchange/
178+
language: python
179+
types: [python]
180+
- id: no-os-remove
181+
name: Check code for instances of os.remove
182+
entry: os\.remove
183+
language: pygrep
184+
types: [python]
185+
files: ^pandas/tests/
186+
exclude: |
187+
(?x)^
188+
pandas/tests/io/pytables/test_store\.py$
175189
- id: unwanted-patterns
176190
name: Unwanted patterns
177191
language: pygrep
@@ -181,6 +195,20 @@ repos:
181195
\#\ type:\ (?!ignore)
182196
|\#\ type:\s?ignore(?!\[)
183197
198+
# foo._class__ instead of type(foo)
199+
|\.__class__
200+
201+
# np.bool/np.object instead of np.bool_/np.object_
202+
|np\.bool[^_8`]
203+
|np\.object[^_8`]
204+
205+
# imports from collections.abc instead of `from collections import abc`
206+
|from\ collections\.abc\ import
207+
208+
# Numpy
209+
|from\ numpy\ import\ random
210+
|from\ numpy\.random\ import
211+
184212
# Incorrect code-block / IPython directives
185213
|\.\.\ code-block\ ::
186214
|\.\.\ ipython\ ::
@@ -189,7 +217,17 @@ repos:
189217
190218
# Check for deprecated messages without sphinx directive
191219
|(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)
220+
221+
# {foo!r} instead of {repr(foo)}
222+
|!r}
223+
224+
# builtin filter function
225+
|(?<!def)[\(\s]filter\(
226+
227+
# exec
228+
|[^a-zA-Z0-9_]exec\(
192229
types_or: [python, cython, rst]
230+
exclude: ^doc/source/development/code_style\.rst # contains examples of patterns to avoid
193231
- id: cython-casting
194232
name: Check Cython casting is `<type>obj`, not `<type> obj`
195233
language: pygrep
@@ -220,6 +258,58 @@ repos:
220258
files: ^pandas/tests/extension/base
221259
types: [python]
222260
exclude: ^pandas/tests/extension/base/base\.py
261+
- id: unwanted-patterns-in-tests
262+
name: Unwanted patterns in tests
263+
language: pygrep
264+
entry: |
265+
(?x)
266+
# pytest.xfail instead of pytest.mark.xfail
267+
pytest\.xfail
268+
269+
# imports from pandas._testing instead of `import pandas._testing as tm`
270+
|from\ pandas\._testing\ import
271+
|from\ pandas\ import\ _testing\ as\ tm
272+
273+
# No direct imports from conftest
274+
|conftest\ import
275+
|import\ conftest
276+
277+
# pandas.testing instead of tm
278+
|pd\.testing\.
279+
280+
# pd.api.types instead of from pandas.api.types import ...
281+
|(pd|pandas)\.api\.types\.
282+
283+
# np.testing, np.array_equal
284+
|(numpy|np)(\.testing|\.array_equal)
285+
286+
# unittest.mock (use pytest builtin monkeypatch fixture instead)
287+
|(unittest(\.| import )mock|mock\.Mock\(\)|mock\.patch)
288+
289+
# pytest raises without context
290+
|\s\ pytest.raises
291+
292+
# pytest.warns (use tm.assert_produces_warning instead)
293+
|pytest\.warns
294+
files: ^pandas/tests/
295+
types_or: [python, cython, rst]
296+
- id: unwanted-patterns-in-ea-tests
297+
name: Unwanted patterns in EA tests
298+
language: pygrep
299+
entry: |
300+
(?x)
301+
tm.assert_(series|frame)_equal
302+
files: ^pandas/tests/extension/base/
303+
exclude: ^pandas/tests/extension/base/base\.py$
304+
types_or: [python, cython, rst]
305+
- id: unwanted-patterns-in-cython
306+
name: Unwanted patterns in Cython code
307+
language: pygrep
308+
entry: |
309+
(?x)
310+
# `<type>obj` as opposed to `<type> obj`
311+
[a-zA-Z0-9*]>[ ]
312+
types: [cython]
223313
- id: pip-to-conda
224314
name: Generate pip dependency from conda
225315
language: python
@@ -233,6 +323,38 @@ repos:
233323
language: python
234324
types: [rst]
235325
files: ^doc/source/(development|reference)/
326+
- id: unwanted-patterns-bare-pytest-raises
327+
name: Check for use of bare pytest raises
328+
language: python
329+
entry: python scripts/validate_unwanted_patterns.py --validation-type="bare_pytest_raises"
330+
types: [python]
331+
files: ^pandas/tests/
332+
exclude: ^pandas/tests/extension/
333+
- id: unwanted-patterns-private-function-across-module
334+
name: Check for use of private functions across modules
335+
language: python
336+
entry: python scripts/validate_unwanted_patterns.py --validation-type="private_function_across_module"
337+
types: [python]
338+
exclude: ^(asv_bench|pandas/tests|doc)/
339+
- id: unwanted-patterns-private-import-across-module
340+
name: Check for import of private attributes across modules
341+
language: python
342+
entry: python scripts/validate_unwanted_patterns.py --validation-type="private_import_across_module"
343+
types: [python]
344+
exclude: |
345+
(?x)
346+
^(asv_bench|pandas/tests|doc)/
347+
|scripts/validate_min_versions_in_sync\.py$
348+
- id: unwanted-patterns-strings-to-concatenate
349+
name: Check for use of not concatenated strings
350+
language: python
351+
entry: python scripts/validate_unwanted_patterns.py --validation-type="strings_to_concatenate"
352+
types_or: [python, cython]
353+
- id: unwanted-patterns-strings-with-misplaced-whitespace
354+
name: Check for strings with misplaced spaces
355+
language: python
356+
entry: python scripts/validate_unwanted_patterns.py --validation-type="strings_with_wrong_placed_whitespace"
357+
types_or: [python, cython]
236358
- id: use-pd_array-in-core
237359
name: Import pandas.array as pd_array in core
238360
language: python

asv_bench/benchmarks/pandas_vb_common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ class BaseIO:
7070
def remove(self, f):
7171
"""Remove created files"""
7272
try:
73-
os.remove(f) # noqa: PDF008
73+
os.remove(f)
7474
except OSError:
7575
# On Windows, attempting to remove a file that is in use
7676
# causes an exception to be raised

doc/scripts/eval_performance.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@
66
from pandas import DataFrame
77

88
setup_common = """from pandas import DataFrame
9-
from numpy.random import randn
10-
df = DataFrame(randn(%d, 3), columns=list('abc'))
9+
df = DataFrame(np.random.randn(%d, 3), columns=list('abc'))
1110
%s"""
1211

1312
setup_with = "s = 'a + b * (c ** 2 + b ** 2 - a) / (a * c) ** 3'"

doc/source/reference/arrays.rst

+31
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,37 @@ is an :class:`ArrowDtype`.
6060
`Pyarrow <https://arrow.apache.org/docs/python/index.html>`__ provides similar array and `data type <https://arrow.apache.org/docs/python/api/datatypes.html>`__
6161
support as NumPy including first-class nullability support for all data types, immutability and more.
6262

63+
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
64+
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
65+
66+
=============================================== ========================== ===================
67+
PyArrow type pandas extension type NumPy type
68+
=============================================== ========================== ===================
69+
:external+pyarrow:py:func:`pyarrow.bool_` :class:`BooleanDtype` ``np.bool_``
70+
:external+pyarrow:py:func:`pyarrow.int8` :class:`Int8Dtype` ``np.int8``
71+
:external+pyarrow:py:func:`pyarrow.int16` :class:`Int16Dtype` ``np.int16``
72+
:external+pyarrow:py:func:`pyarrow.int32` :class:`Int32Dtype` ``np.int32``
73+
:external+pyarrow:py:func:`pyarrow.int64` :class:`Int64Dtype` ``np.int64``
74+
:external+pyarrow:py:func:`pyarrow.uint8` :class:`UInt8Dtype` ``np.uint8``
75+
:external+pyarrow:py:func:`pyarrow.uint16` :class:`UInt16Dtype` ``np.uint16``
76+
:external+pyarrow:py:func:`pyarrow.uint32` :class:`UInt32Dtype` ``np.uint32``
77+
:external+pyarrow:py:func:`pyarrow.uint64` :class:`UInt64Dtype` ``np.uint64``
78+
:external+pyarrow:py:func:`pyarrow.float32` :class:`Float32Dtype` ``np.float32``
79+
:external+pyarrow:py:func:`pyarrow.float64` :class:`Float64Dtype` ``np.float64``
80+
:external+pyarrow:py:func:`pyarrow.time32` (none) (none)
81+
:external+pyarrow:py:func:`pyarrow.time64` (none) (none)
82+
:external+pyarrow:py:func:`pyarrow.timestamp` :class:`DatetimeTZDtype` ``np.datetime64``
83+
:external+pyarrow:py:func:`pyarrow.date32` (none) (none)
84+
:external+pyarrow:py:func:`pyarrow.date64` (none) (none)
85+
:external+pyarrow:py:func:`pyarrow.duration` (none) ``np.timedelta64``
86+
:external+pyarrow:py:func:`pyarrow.binary` (none) (none)
87+
:external+pyarrow:py:func:`pyarrow.string` :class:`StringDtype` ``np.str_``
88+
:external+pyarrow:py:func:`pyarrow.decimal128` (none) (none)
89+
:external+pyarrow:py:func:`pyarrow.list_` (none) (none)
90+
:external+pyarrow:py:func:`pyarrow.map_` (none) (none)
91+
:external+pyarrow:py:func:`pyarrow.dictionary` :class:`CategoricalDtype` (none)
92+
=============================================== ========================== ===================
93+
6394
.. note::
6495

6596
For string types (``pyarrow.string()``, ``string[pyarrow]``), PyArrow support is still facilitated

doc/source/user_guide/io.rst

+15
Original file line numberDiff line numberDiff line change
@@ -1255,6 +1255,21 @@ The bad line will be a list of strings that was split by the ``sep``:
12551255
12561256
.. versionadded:: 1.4.0
12571257
1258+
Note that the callable function will handle only a line with too many fields.
1259+
Bad lines caused by other errors will be silently skipped.
1260+
1261+
For example:
1262+
1263+
.. code-block:: ipython
1264+
1265+
def bad_lines_func(line):
1266+
print(line)
1267+
1268+
data = 'name,type\nname a,a is of type a\nname b,"b\" is of type b"'
1269+
data
1270+
pd.read_csv(data, on_bad_lines=bad_lines_func, engine="python")
1271+
1272+
The line was not processed in this case, as a "bad line" here is caused by an escape character.
12581273

12591274
You can also use the ``usecols`` parameter to eliminate extraneous column
12601275
data that appear in some lines but not others:

doc/source/whatsnew/v1.4.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@ Null-values are no longer coerced to NaN-value in value_counts and mode
320320
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
321321

322322
:meth:`Series.value_counts` and :meth:`Series.mode` no longer coerce ``None``,
323-
``NaT`` and other null-values to a NaN-value for ``np.object``-dtype. This
323+
``NaT`` and other null-values to a NaN-value for ``np.object_``-dtype. This
324324
behavior is now consistent with ``unique``, ``isin`` and others
325325
(:issue:`42688`).
326326

doc/source/whatsnew/v1.5.0.rst

+46
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,52 @@ and attributes without holding entire tree in memory (:issue:`45442`).
290290
.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk
291291
.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
292292

293+
.. _whatsnew_150.enhancements.copy_on_write:
294+
295+
Copy on Write
296+
^^^^^^^^^^^^^
297+
298+
A new feature ``copy_on_write`` was added (:issue:`46958`). Copy on write ensures that
299+
any DataFrame or Series derived from another in any way always behaves as a copy.
300+
Copy on write disallows updating any other object than the object the method
301+
was applied to.
302+
303+
Copy on write can be enabled through:
304+
305+
.. code-block:: python
306+
307+
pd.set_option("mode.copy_on_write", True)
308+
pd.options.mode.copy_on_write = True
309+
310+
Alternatively, copy on write can be enabled locally through:
311+
312+
.. code-block:: python
313+
314+
with pd.option_context("mode.copy_on_write", True):
315+
...
316+
317+
Without copy on write, the parent :class:`DataFrame` is updated when updating a child
318+
:class:`DataFrame` that was derived from this :class:`DataFrame`.
319+
320+
.. ipython:: python
321+
322+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})
323+
view = df["foo"]
324+
view.iloc[0]
325+
df
326+
327+
With copy on write enabled, df won't be updated anymore:
328+
329+
.. ipython:: python
330+
331+
with pd.option_context("mode.copy_on_write", True):
332+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})
333+
view = df["foo"]
334+
view.iloc[0]
335+
df
336+
337+
A more detailed explanation can be found `here <https://phofl.github.io/cow-introduction.html>`_.
338+
293339
.. _whatsnew_150.enhancements.other:
294340

295341
Other enhancements

0 commit comments

Comments
 (0)