Skip to content

Commit 43d12ae

Browse files
authored
Merge branch 'main' into iso8601-duration-parsing-feature
2 parents 9ce3faa + 900ffa3 commit 43d12ae

File tree

197 files changed

+3635
-2812
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

197 files changed

+3635
-2812
lines changed

.github/workflows/package-checks.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ jobs:
2020
runs-on: ubuntu-22.04
2121
strategy:
2222
matrix:
23-
extra: ["test", "performance", "timezone", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "all"]
23+
extra: ["test", "performance", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "all"]
2424
fail-fast: false
2525
name: Install Extras - ${{ matrix.extra }}
2626
concurrency:

.github/workflows/wheels.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,8 @@ jobs:
173173
pip install hypothesis>=6.34.2 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17
174174
cd .. # Not a good idea to test within the src tree
175175
python -c "import pandas; print(pandas.__version__);
176-
pandas.test(extra_args=['-m not clipboard and not single_cpu and not slow and not network and not db', '-n 2']);
177-
pandas.test(extra_args=['-m not clipboard and single_cpu and not slow and not network and not db'])"
176+
pandas.test(extra_args=['-m not clipboard and not single_cpu and not slow and not network and not db', '-n 2', '--no-strict-data-files']);
177+
pandas.test(extra_args=['-m not clipboard and single_cpu and not slow and not network and not db', '--no-strict-data-files'])"
178178
- uses: actions/upload-artifact@v3
179179
with:
180180
name: sdist

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ dist
5353
# type checkers
5454
pandas/py.typed
5555

56+
# pyenv
57+
.python-version
58+
5659
# tox testing tool
5760
.tox
5861
# rope

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ repos:
2828
types_or: [python, pyi]
2929
additional_dependencies: [black==23.1.0]
3030
- repo: https://github.com/charliermarsh/ruff-pre-commit
31-
rev: v0.0.253
31+
rev: v0.0.255
3232
hooks:
3333
- id: ruff
3434
args: [--exit-non-zero-on-fix]

ci/code_checks.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9797
pandas.Series.is_monotonic_increasing \
9898
pandas.Series.is_monotonic_decreasing \
9999
pandas.Series.backfill \
100+
pandas.Series.bfill \
101+
pandas.Series.ffill \
100102
pandas.Series.pad \
101103
pandas.Series.argsort \
102104
pandas.Series.reorder_levels \
@@ -541,14 +543,14 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
541543
pandas.DataFrame.iterrows \
542544
pandas.DataFrame.pipe \
543545
pandas.DataFrame.backfill \
546+
pandas.DataFrame.bfill \
547+
pandas.DataFrame.ffill \
544548
pandas.DataFrame.pad \
545549
pandas.DataFrame.swapaxes \
546550
pandas.DataFrame.first_valid_index \
547551
pandas.DataFrame.last_valid_index \
548552
pandas.DataFrame.attrs \
549553
pandas.DataFrame.plot \
550-
pandas.DataFrame.sparse.density \
551-
pandas.DataFrame.sparse.to_coo \
552554
pandas.DataFrame.to_gbq \
553555
pandas.DataFrame.style \
554556
pandas.DataFrame.__dataframe__

ci/deps/actions-310-numpydev.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,11 @@ dependencies:
1818
- python-dateutil
1919
- pytz
2020
- pip
21+
2122
- pip:
2223
- "cython"
2324
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2425
- "--pre"
2526
- "numpy"
2627
- "scipy"
28+
- "tzdata>=2022.1"

ci/deps/actions-310.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-311.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-38-downstream_compat.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,6 @@ dependencies:
6868
- pandas-gbq>=0.15.0
6969
- pyyaml
7070
- py
71+
72+
- pip:
73+
- tzdata>=2022.1

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,11 @@ dependencies:
5252
- scipy=1.7.1
5353
- sqlalchemy=1.4.16
5454
- tabulate=0.8.9
55-
- tzdata=2022a
5655
- xarray=0.21.0
5756
- xlrd=2.0.1
5857
- xlsxwriter=1.4.3
5958
- zstandard=0.15.2
6059

6160
- pip:
6261
- pyqt5==5.15.1
62+
- tzdata==2022.1

ci/deps/actions-38.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,6 @@ dependencies:
5353
- xlrd>=2.0.1
5454
- xlsxwriter>=1.4.3
5555
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-39.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-pypy-38.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@ dependencies:
2222
- numpy
2323
- python-dateutil
2424
- pytz
25+
26+
- pip:
27+
- tzdata>=2022.1

ci/test_wheels.py

+2
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,12 @@
4141
multi_args = [
4242
"-m not clipboard and not single_cpu and not slow and not network and not db",
4343
"-n 2",
44+
"--no-strict-data-files",
4445
]
4546
pd.test(extra_args=multi_args)
4647
pd.test(
4748
extra_args=[
4849
"-m not clipboard and single_cpu and not slow and not network and not db",
50+
"--no-strict-data-files",
4951
]
5052
)

ci/test_wheels_windows.bat

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
set test_command=import pandas as pd; print(pd.__version__); ^
2-
pd.test(extra_args=['-m not clipboard and not single_cpu and not slow and not network and not db', '-n 2']); ^
3-
pd.test(extra_args=['-m not clipboard and single_cpu and not slow and not network and not db'])
2+
pd.test(extra_args=['-m not clipboard and not single_cpu and not slow and not network and not db', '--no-strict-data-files', '-n=2']); ^
3+
pd.test(extra_args=['-m not clipboard and single_cpu and not slow and not network and not db', '--no-strict-data-files'])
44

55
python --version
6-
pip install pytz six numpy python-dateutil
6+
pip install pytz six numpy python-dateutil tzdata>=2022.1
77
pip install hypothesis>=6.34.2 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17
88
pip install --find-links=pandas/dist --no-index pandas
99
python -c "%test_command%"

doc/source/conf.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -101,20 +101,20 @@
101101
reldir = os.path.relpath(dirname, source_path)
102102
for fname in fnames:
103103
if os.path.splitext(fname)[-1] in (".rst", ".ipynb"):
104-
fname = os.path.relpath(os.path.join(dirname, fname), source_path)
104+
rel_fname = os.path.relpath(os.path.join(dirname, fname), source_path)
105105

106-
if fname == "index.rst" and os.path.abspath(dirname) == source_path:
106+
if rel_fname == "index.rst" and os.path.abspath(dirname) == source_path:
107107
continue
108108
if pattern == "-api" and reldir.startswith("reference"):
109-
exclude_patterns.append(fname)
109+
exclude_patterns.append(rel_fname)
110110
elif (
111111
pattern == "whatsnew"
112112
and not reldir.startswith("reference")
113113
and reldir != "whatsnew"
114114
):
115-
exclude_patterns.append(fname)
116-
elif single_doc and fname != pattern:
117-
exclude_patterns.append(fname)
115+
exclude_patterns.append(rel_fname)
116+
elif single_doc and rel_fname != pattern:
117+
exclude_patterns.append(rel_fname)
118118

119119
with open(os.path.join(source_path, "index.rst.template")) as f:
120120
t = jinja2.Template(f.read())

doc/source/development/community.rst

+8-6
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,11 @@ contributing to pandas. The slack is a private space, specifically meant for
111111
people who are hesitant to bring up their questions or ideas on a large public
112112
mailing list or GitHub.
113113

114-
If this sounds like the right place for you, you are welcome to join! Email us
115-
at `[email protected] <mailto://[email protected]>`_ and let us
116-
know that you read and agree to our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_
117-
😉 to get an invite. And please remember that slack is not meant to replace the
118-
mailing list or issue tracker - all important announcements and conversations
119-
should still happen there.
114+
If this sounds like the right place for you, you are welcome to join using
115+
`this link <https://join.slack.com/t/pandas-dev-community/shared_invite/zt-1e2qgy1r6-PLCN8UOLEUAYoLdAsaJilw>`_!
116+
Please remember to follow our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_,
117+
and be aware that our admins are monitoring for irrelevant messages and will remove folks who use
118+
our
119+
slack for spam, advertisements and messages not related to the pandas contributing community. And
120+
please remember that slack is not meant to replace the mailing list or issue tracker - all important
121+
announcements and conversations should still happen there.

doc/source/development/extending.rst

+46
Original file line numberDiff line numberDiff line change
@@ -488,3 +488,49 @@ registers the default "matplotlib" backend as follows.
488488
489489
More information on how to implement a third-party plotting backend can be found at
490490
https://github.com/pandas-dev/pandas/blob/main/pandas/plotting/__init__.py#L1.
491+
492+
.. _extending.pandas_priority:
493+
494+
Arithmetic with 3rd party types
495+
-------------------------------
496+
497+
In order to control how arithmetic works between a custom type and a pandas type,
498+
implement ``__pandas_priority__``. Similar to numpy's ``__array_priority__``
499+
semantics, arithmetic methods on :class:`DataFrame`, :class:`Series`, and :class:`Index`
500+
objects will delegate to ``other``, if it has an attribute ``__pandas_priority__`` with a higher value.
501+
502+
By default, pandas objects try to operate with other objects, even if they are not types known to pandas:
503+
504+
.. code-block:: python
505+
506+
>>> pd.Series([1, 2]) + [10, 20]
507+
0 11
508+
1 22
509+
dtype: int64
510+
511+
In the example above, if ``[10, 20]`` was a custom type that can be understood as a list, pandas objects will still operate with it in the same way.
512+
513+
In some cases, it is useful to delegate to the other type the operation. For example, consider I implement a
514+
custom list object, and I want the result of adding my custom list with a pandas :class:`Series` to be an instance of my list
515+
and not a :class:`Series` as seen in the previous example. This is now possible by defining the ``__pandas_priority__`` attribute
516+
of my custom list, and setting it to a higher value, than the priority of the pandas objects I want to operate with.
517+
518+
The ``__pandas_priority__`` of :class:`DataFrame`, :class:`Series`, and :class:`Index` are ``4000``, ``3000``, and ``2000`` respectively. The base ``ExtensionArray.__pandas_priority__`` is ``1000``.
519+
520+
.. code-block:: python
521+
522+
class CustomList(list):
523+
__pandas_priority__ = 5000
524+
525+
def __radd__(self, other):
526+
# return `self` and not the addition for simplicity
527+
return self
528+
529+
custom = CustomList()
530+
series = pd.Series([1, 2, 3])
531+
532+
# Series refuses to add custom, since it's an unknown type with higher priority
533+
assert series.__add__(custom) is NotImplemented
534+
535+
# This will cause the custom class `__radd__` being used instead
536+
assert series + custom is custom

doc/source/getting_started/install.rst

-19
Original file line numberDiff line numberDiff line change
@@ -308,25 +308,6 @@ Dependency Minimum Version pip ext
308308
`numba <https://github.com/numba/numba>`__ 0.53.1 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler.
309309
===================================================== ================== ================== ===================================================================================================================================================================================
310310

311-
Timezones
312-
^^^^^^^^^
313-
314-
Installable with ``pip install "pandas[timezone]"``
315-
316-
========================= ========================= =============== =============================================================
317-
Dependency Minimum Version pip extra Notes
318-
========================= ========================= =============== =============================================================
319-
tzdata 2022.1(pypi)/ timezone Allows the use of ``zoneinfo`` timezones with pandas.
320-
2022a(for system tzdata) **Note**: You only need to install the pypi package if your
321-
system does not already provide the IANA tz database.
322-
However, the minimum tzdata version still applies, even if it
323-
is not enforced through an error.
324-
325-
If you would like to keep your system tzdata version updated,
326-
it is recommended to use the ``tzdata`` package from
327-
conda-forge.
328-
========================= ========================= =============== =============================================================
329-
330311
Visualization
331312
^^^^^^^^^^^^^
332313

doc/source/user_guide/10min.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -702,11 +702,11 @@ Sorting is per order in the categories, not lexical order:
702702
703703
df.sort_values(by="grade")
704704
705-
Grouping by a categorical column also shows empty categories:
705+
Grouping by a categorical column with ``observed=False`` also shows empty categories:
706706

707707
.. ipython:: python
708708
709-
df.groupby("grade").size()
709+
df.groupby("grade", observed=False).size()
710710
711711
712712
Plotting

doc/source/user_guide/advanced.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -800,8 +800,8 @@ Groupby operations on the index will preserve the index nature as well.
800800

801801
.. ipython:: python
802802
803-
df2.groupby(level=0).sum()
804-
df2.groupby(level=0).sum().index
803+
df2.groupby(level=0, observed=True).sum()
804+
df2.groupby(level=0, observed=True).sum().index
805805
806806
Reindexing operations will return a resulting index based on the type of the passed
807807
indexer. Passing a list will return a plain-old ``Index``; indexing with

doc/source/user_guide/categorical.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -607,7 +607,7 @@ even if some categories are not present in the data:
607607
s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"]))
608608
s.value_counts()
609609
610-
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories.
610+
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories when ``observed=False``.
611611

612612
.. ipython:: python
613613
@@ -618,17 +618,17 @@ even if some categories are not present in the data:
618618
data=[[1, 2, 3], [4, 5, 6]],
619619
columns=pd.MultiIndex.from_arrays([["A", "B", "B"], columns]),
620620
).T
621-
df.groupby(level=1).sum()
621+
df.groupby(level=1, observed=False).sum()
622622
623-
Groupby will also show "unused" categories:
623+
Groupby will also show "unused" categories when ``observed=False``:
624624

625625
.. ipython:: python
626626
627627
cats = pd.Categorical(
628628
["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"]
629629
)
630630
df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]})
631-
df.groupby("cats").mean()
631+
df.groupby("cats", observed=False).mean()
632632
633633
cats2 = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"])
634634
df2 = pd.DataFrame(
@@ -638,7 +638,7 @@ Groupby will also show "unused" categories:
638638
"values": [1, 2, 3, 4],
639639
}
640640
)
641-
df2.groupby(["cats", "B"]).mean()
641+
df2.groupby(["cats", "B"], observed=False).mean()
642642
643643
644644
Pivot tables:

0 commit comments

Comments
 (0)