Skip to content

Commit be2044c

Browse files
committed
Merge branch 'master' into categorical_map_na_Action
2 parents 64b7297 + 1f7a7f2 commit be2044c

File tree

189 files changed

+2656
-2420
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

189 files changed

+2656
-2420
lines changed

.circleci/setup_env.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@ if pip list | grep -q ^pandas; then
5555
fi
5656

5757
echo "Build extensions"
58-
python setup.py build_ext -q -j4
58+
# GH 47305: Parallel build can causes flaky ImportError from pandas/_libs/tslibs
59+
python setup.py build_ext -q -j1
5960

6061
echo "Install pandas"
6162
python -m pip install --no-build-isolation --no-use-pep517 -e .

.github/actions/build_pandas/action.yml

+4-2
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,7 @@ runs:
1616
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
1717
shell: bash -el {0}
1818
env:
19-
# https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
20-
N_JOBS: ${{ runner.os == 'macOS' && 3 || 2 }}
19+
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20+
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21+
N_JOBS: 1
22+
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}

.github/actions/setup-conda/action.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ runs:
3030
environment-name: ${{ inputs.environment-name }}
3131
extra-specs: ${{ inputs.extra-specs }}
3232
channels: conda-forge
33-
channel-priority: 'strict'
33+
channel-priority: ${{ runner.os == 'macOS' && 'flexible' || 'strict' }}
3434
condarc-file: ci/condarc.yml
3535
cache-env: true
3636
cache-downloads: true

.github/workflows/32-bit-linux.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
python -m pip install --no-deps -U pip wheel 'setuptools<60.0.0' && \
4141
python -m pip install versioneer[toml] && \
4242
python -m pip install cython numpy python-dateutil pytz pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.34.2 && \
43-
python setup.py build_ext -q -j$(nproc) && \
43+
python setup.py build_ext -q -j1 && \
4444
python -m pip install --no-build-isolation --no-use-pep517 -e . && \
4545
python -m pip list && \
4646
export PANDAS_CI=1 && \

.github/workflows/package-checks.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ jobs:
2020
runs-on: ubuntu-22.04
2121
strategy:
2222
matrix:
23-
extra: ["test", "performance", "timezone", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "all"]
23+
extra: ["test", "performance", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "all"]
2424
fail-fast: false
2525
name: Install Extras - ${{ matrix.extra }}
2626
concurrency:

.github/workflows/python-dev.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,10 @@ jobs:
8282
python -m pip install python-dateutil pytz cython hypothesis>=6.34.2 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-cov pytest-asyncio>=0.17
8383
python -m pip list
8484
85+
# GH 47305: Parallel build can cause flaky ImportError from pandas/_libs/tslibs
8586
- name: Build Pandas
8687
run: |
87-
python setup.py build_ext -q -j4
88+
python setup.py build_ext -q -j1
8889
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
8990
9091
- name: Build Version

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ repos:
2828
types_or: [python, pyi]
2929
additional_dependencies: [black==23.1.0]
3030
- repo: https://github.com/charliermarsh/ruff-pre-commit
31-
rev: v0.0.253
31+
rev: v0.0.255
3232
hooks:
3333
- id: ruff
3434
args: [--exit-non-zero-on-fix]

MANIFEST.in

-2
Original file line numberDiff line numberDiff line change
@@ -58,5 +58,3 @@ prune pandas/tests/io/parser/data
5858
# Selectively re-add *.cxx files that were excluded above
5959
graft pandas/_libs/src
6060
graft pandas/_libs/tslibs/src
61-
include pandas/_libs/pd_parser.h
62-
include pandas/_libs/pd_parser.c

ci/code_checks.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9797
pandas.Series.is_monotonic_increasing \
9898
pandas.Series.is_monotonic_decreasing \
9999
pandas.Series.backfill \
100+
pandas.Series.bfill \
101+
pandas.Series.ffill \
100102
pandas.Series.pad \
101103
pandas.Series.argsort \
102104
pandas.Series.reorder_levels \
@@ -541,14 +543,14 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
541543
pandas.DataFrame.iterrows \
542544
pandas.DataFrame.pipe \
543545
pandas.DataFrame.backfill \
546+
pandas.DataFrame.bfill \
547+
pandas.DataFrame.ffill \
544548
pandas.DataFrame.pad \
545549
pandas.DataFrame.swapaxes \
546550
pandas.DataFrame.first_valid_index \
547551
pandas.DataFrame.last_valid_index \
548552
pandas.DataFrame.attrs \
549553
pandas.DataFrame.plot \
550-
pandas.DataFrame.sparse.density \
551-
pandas.DataFrame.sparse.to_coo \
552554
pandas.DataFrame.to_gbq \
553555
pandas.DataFrame.style \
554556
pandas.DataFrame.__dataframe__

ci/deps/actions-310-numpydev.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,11 @@ dependencies:
1818
- python-dateutil
1919
- pytz
2020
- pip
21+
2122
- pip:
2223
- "cython"
2324
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2425
- "--pre"
2526
- "numpy"
2627
- "scipy"
28+
- "tzdata>=2022.1"

ci/deps/actions-310.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-311.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-38-downstream_compat.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,6 @@ dependencies:
6868
- pandas-gbq>=0.15.0
6969
- pyyaml
7070
- py
71+
72+
- pip:
73+
- tzdata>=2022.1

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,11 @@ dependencies:
5252
- scipy=1.7.1
5353
- sqlalchemy=1.4.16
5454
- tabulate=0.8.9
55-
- tzdata=2022a
5655
- xarray=0.21.0
5756
- xlrd=2.0.1
5857
- xlsxwriter=1.4.3
5958
- zstandard=0.15.2
6059

6160
- pip:
6261
- pyqt5==5.15.1
62+
- tzdata==2022.1

ci/deps/actions-38.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,6 @@ dependencies:
5353
- xlrd>=2.0.1
5454
- xlsxwriter>=1.4.3
5555
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-39.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,10 @@ dependencies:
4949
- scipy>=1.7.1
5050
- sqlalchemy>=1.4.16
5151
- tabulate>=0.8.9
52-
- tzdata>=2022a
5352
- xarray>=0.21.0
5453
- xlrd>=2.0.1
5554
- xlsxwriter>=1.4.3
5655
- zstandard>=0.15.2
56+
57+
- pip:
58+
- tzdata>=2022.1

ci/deps/actions-pypy-38.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@ dependencies:
2222
- numpy
2323
- python-dateutil
2424
- pytz
25+
26+
- pip:
27+
- tzdata>=2022.1

ci/test_wheels_windows.bat

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ pd.test(extra_args=['-m not clipboard and not single_cpu and not slow and not ne
33
pd.test(extra_args=['-m not clipboard and single_cpu and not slow and not network and not db'])
44

55
python --version
6-
pip install pytz six numpy python-dateutil
6+
pip install pytz six numpy python-dateutil tzdata>=2022.1
77
pip install hypothesis>=6.34.2 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17
88
pip install --find-links=pandas/dist --no-index pandas
99
python -c "%test_command%"

doc/source/conf.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -101,20 +101,20 @@
101101
reldir = os.path.relpath(dirname, source_path)
102102
for fname in fnames:
103103
if os.path.splitext(fname)[-1] in (".rst", ".ipynb"):
104-
fname = os.path.relpath(os.path.join(dirname, fname), source_path)
104+
rel_fname = os.path.relpath(os.path.join(dirname, fname), source_path)
105105

106-
if fname == "index.rst" and os.path.abspath(dirname) == source_path:
106+
if rel_fname == "index.rst" and os.path.abspath(dirname) == source_path:
107107
continue
108108
if pattern == "-api" and reldir.startswith("reference"):
109-
exclude_patterns.append(fname)
109+
exclude_patterns.append(rel_fname)
110110
elif (
111111
pattern == "whatsnew"
112112
and not reldir.startswith("reference")
113113
and reldir != "whatsnew"
114114
):
115-
exclude_patterns.append(fname)
116-
elif single_doc and fname != pattern:
117-
exclude_patterns.append(fname)
115+
exclude_patterns.append(rel_fname)
116+
elif single_doc and rel_fname != pattern:
117+
exclude_patterns.append(rel_fname)
118118

119119
with open(os.path.join(source_path, "index.rst.template")) as f:
120120
t = jinja2.Template(f.read())

doc/source/development/community.rst

+8-6
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,11 @@ contributing to pandas. The slack is a private space, specifically meant for
111111
people who are hesitant to bring up their questions or ideas on a large public
112112
mailing list or GitHub.
113113

114-
If this sounds like the right place for you, you are welcome to join! Email us
115-
at `[email protected] <mailto://[email protected]>`_ and let us
116-
know that you read and agree to our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_
117-
😉 to get an invite. And please remember that slack is not meant to replace the
118-
mailing list or issue tracker - all important announcements and conversations
119-
should still happen there.
114+
If this sounds like the right place for you, you are welcome to join using
115+
`this link <https://join.slack.com/t/pandas-dev-community/shared_invite/zt-1e2qgy1r6-PLCN8UOLEUAYoLdAsaJilw>`_!
116+
Please remember to follow our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_,
117+
and be aware that our admins are monitoring for irrelevant messages and will remove folks who use
118+
our
119+
slack for spam, advertisements and messages not related to the pandas contributing community. And
120+
please remember that slack is not meant to replace the mailing list or issue tracker - all important
121+
announcements and conversations should still happen there.

doc/source/getting_started/install.rst

-19
Original file line numberDiff line numberDiff line change
@@ -308,25 +308,6 @@ Dependency Minimum Version pip ext
308308
`numba <https://github.com/numba/numba>`__ 0.53.1 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler.
309309
===================================================== ================== ================== ===================================================================================================================================================================================
310310

311-
Timezones
312-
^^^^^^^^^
313-
314-
Installable with ``pip install "pandas[timezone]"``
315-
316-
========================= ========================= =============== =============================================================
317-
Dependency Minimum Version pip extra Notes
318-
========================= ========================= =============== =============================================================
319-
tzdata 2022.1(pypi)/ timezone Allows the use of ``zoneinfo`` timezones with pandas.
320-
2022a(for system tzdata) **Note**: You only need to install the pypi package if your
321-
system does not already provide the IANA tz database.
322-
However, the minimum tzdata version still applies, even if it
323-
is not enforced through an error.
324-
325-
If you would like to keep your system tzdata version updated,
326-
it is recommended to use the ``tzdata`` package from
327-
conda-forge.
328-
========================= ========================= =============== =============================================================
329-
330311
Visualization
331312
^^^^^^^^^^^^^
332313

doc/source/user_guide/10min.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -702,11 +702,11 @@ Sorting is per order in the categories, not lexical order:
702702
703703
df.sort_values(by="grade")
704704
705-
Grouping by a categorical column also shows empty categories:
705+
Grouping by a categorical column with ``observed=False`` also shows empty categories:
706706

707707
.. ipython:: python
708708
709-
df.groupby("grade").size()
709+
df.groupby("grade", observed=False).size()
710710
711711
712712
Plotting

doc/source/user_guide/advanced.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -800,8 +800,8 @@ Groupby operations on the index will preserve the index nature as well.
800800

801801
.. ipython:: python
802802
803-
df2.groupby(level=0).sum()
804-
df2.groupby(level=0).sum().index
803+
df2.groupby(level=0, observed=True).sum()
804+
df2.groupby(level=0, observed=True).sum().index
805805
806806
Reindexing operations will return a resulting index based on the type of the passed
807807
indexer. Passing a list will return a plain-old ``Index``; indexing with

doc/source/user_guide/categorical.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -607,7 +607,7 @@ even if some categories are not present in the data:
607607
s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"]))
608608
s.value_counts()
609609
610-
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories.
610+
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories when ``observed=False``.
611611

612612
.. ipython:: python
613613
@@ -618,17 +618,17 @@ even if some categories are not present in the data:
618618
data=[[1, 2, 3], [4, 5, 6]],
619619
columns=pd.MultiIndex.from_arrays([["A", "B", "B"], columns]),
620620
).T
621-
df.groupby(level=1).sum()
621+
df.groupby(level=1, observed=False).sum()
622622
623-
Groupby will also show "unused" categories:
623+
Groupby will also show "unused" categories when ``observed=False``:
624624

625625
.. ipython:: python
626626
627627
cats = pd.Categorical(
628628
["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"]
629629
)
630630
df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]})
631-
df.groupby("cats").mean()
631+
df.groupby("cats", observed=False).mean()
632632
633633
cats2 = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"])
634634
df2 = pd.DataFrame(
@@ -638,7 +638,7 @@ Groupby will also show "unused" categories:
638638
"values": [1, 2, 3, 4],
639639
}
640640
)
641-
df2.groupby(["cats", "B"]).mean()
641+
df2.groupby(["cats", "B"], observed=False).mean()
642642
643643
644644
Pivot tables:

doc/source/user_guide/copy_on_write.rst

+33-7
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@
66
Copy-on-Write (CoW)
77
*******************
88

9-
.. ipython:: python
10-
:suppress:
11-
12-
pd.options.mode.copy_on_write = True
13-
149
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
1510
optimizations that become possible through CoW are implemented and supported. A complete list
1611
can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
@@ -21,6 +16,36 @@ CoW will lead to more predictable behavior since it is not possible to update mo
2116
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
2217
delaying copies as long as possible, the average performance and memory usage will improve.
2318

19+
Previous behavior
20+
-----------------
21+
22+
pandas indexing behavior is tricky to understand. Some operations return views while
23+
other return copies. Depending on the result of the operation, mutation one object
24+
might accidentally mutate another:
25+
26+
.. ipython:: python
27+
28+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
29+
subset = df["foo"]
30+
subset.iloc[0] = 100
31+
df
32+
33+
Mutating ``subset``, e.g. updating its values, also updates ``df``. The exact behavior is
34+
hard to predict. Copy-on-Write solves accidentally modifying more than one object,
35+
it explicitly disallows this. With CoW enabled, ``df`` is unchanged:
36+
37+
.. ipython:: python
38+
39+
pd.options.mode.copy_on_write = True
40+
41+
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
42+
subset = df["foo"]
43+
subset.iloc[0] = 100
44+
df
45+
46+
The following sections will explain what this means and how it impacts existing
47+
applications.
48+
2449
Description
2550
-----------
2651

@@ -114,10 +139,11 @@ two subsequent indexing operations, e.g.
114139
The column ``foo`` is updated where the column ``bar`` is greater than 5.
115140
This violates the CoW principles though, because it would have to modify the
116141
view ``df["foo"]`` and ``df`` in one step. Hence, chained assignment will
117-
consistently never work and raise a ``ChainedAssignmentError`` with CoW enabled:
142+
consistently never work and raise a ``ChainedAssignmentError`` warning
143+
with CoW enabled:
118144

119145
.. ipython:: python
120-
:okexcept:
146+
:okwarning:
121147
122148
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
123149
df["foo"][df["bar"] > 5] = 100

doc/source/user_guide/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1401,7 +1401,7 @@ can be used as group keys. If so, the order of the levels will be preserved:
14011401
14021402
factor = pd.qcut(data, [0, 0.25, 0.5, 0.75, 1.0])
14031403
1404-
data.groupby(factor).mean()
1404+
data.groupby(factor, observed=False).mean()
14051405
14061406
.. _groupby.specify:
14071407

0 commit comments

Comments
 (0)