Skip to content

Commit 21c1730

Browse files
committed
Merge remote-tracking branch 'upstream/main' into same_val_counts_for_roll_sum_mean
# Conflicts: # doc/source/whatsnew/v1.5.0.rst
2 parents fd68eb4 + b3b5e2a commit 21c1730

File tree

137 files changed

+2494
-1235
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+2494
-1235
lines changed

.github/workflows/code-checks.yml

+1-15
Original file line numberDiff line numberDiff line change
@@ -140,22 +140,8 @@ jobs:
140140
- name: Run ASV benchmarks
141141
run: |
142142
cd asv_bench
143-
asv check -E existing
144-
git remote add upstream https://github.com/pandas-dev/pandas.git
145-
git fetch upstream
146143
asv machine --yes
147-
asv dev | sed "/failed$/ s/^/##[error]/" | tee benchmarks.log
148-
if grep "failed" benchmarks.log > /dev/null ; then
149-
exit 1
150-
fi
151-
if: ${{ steps.build.outcome == 'success' }}
152-
153-
- name: Publish benchmarks artifact
154-
uses: actions/upload-artifact@v3
155-
with:
156-
name: Benchmarks log
157-
path: asv_bench/benchmarks.log
158-
if: failure()
144+
asv run --quick --dry-run --strict --durations=30 --python=same
159145
160146
build_docker_dev_environment:
161147
name: Build Docker Dev Environment

.pre-commit-config.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,10 @@ repos:
7070
- id: rst-inline-touching-normal
7171
types: [text] # overwrite types: [rst]
7272
types_or: [python, rst]
73+
- repo: https://github.com/sphinx-contrib/sphinx-lint
74+
rev: v0.2
75+
hooks:
76+
- id: sphinx-lint
7377
- repo: https://github.com/asottile/yesqa
7478
rev: v1.3.0
7579
hooks:

LICENSES/OTHER

+1-6
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,3 @@
1-
numpydoc license
2-
----------------
3-
4-
The numpydoc license is in pandas/doc/sphinxext/LICENSE.txt
5-
61
Bottleneck license
72
------------------
83

@@ -77,4 +72,4 @@ DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
7772
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
7873
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
7974
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
80-
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
75+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ci/deps/actions-38-downstream_compat.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ dependencies:
6060
- cftime
6161
- dask
6262
- ipython
63-
- geopandas
63+
- geopandas-base
6464
- seaborn
6565
- scikit-learn
6666
- statsmodels

doc/source/development/contributing_codebase.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
223223
...
224224
else: # Reasonably only str objects would reach this but...
225225
obj = cast(str, obj) # Mypy complains without this!
226-
return obj.upper()
226+
return obj.upper()
227227
228228
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
229229

doc/source/development/contributing_environment.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ You will need `Build Tools for Visual Studio 2019
8585
<https://visualstudio.microsoft.com/downloads/>`_.
8686

8787
.. warning::
88-
You DO NOT need to install Visual Studio 2019.
89-
You only need "Build Tools for Visual Studio 2019" found by
90-
scrolling down to "All downloads" -> "Tools for Visual Studio 2019".
91-
In the installer, select the "C++ build tools" workload.
88+
You DO NOT need to install Visual Studio 2019.
89+
You only need "Build Tools for Visual Studio 2019" found by
90+
scrolling down to "All downloads" -> "Tools for Visual Studio 2019".
91+
In the installer, select the "C++ build tools" workload.
9292

9393
You can install the necessary components on the commandline using
9494
`vs_buildtools.exe <https://download.visualstudio.microsoft.com/download/pr/9a26f37e-6001-429b-a5db-c5455b93953c/460d80ab276046de2455a4115cc4e2f1e6529c9e6cb99501844ecafd16c619c4/vs_BuildTools.exe>`_:

doc/source/ecosystem.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -540,15 +540,15 @@ Pandas-Genomics provides extension types, extension arrays, and extension access
540540
`Pint-Pandas`_
541541
~~~~~~~~~~~~~~
542542

543-
``Pint-Pandas <https://github.com/hgrecco/pint-pandas>`` provides an extension type for
543+
`Pint-Pandas <https://github.com/hgrecco/pint-pandas>`_ provides an extension type for
544544
storing numeric arrays with units. These arrays can be stored inside pandas'
545545
Series and DataFrame. Operations between Series and DataFrame columns which
546546
use pint's extension array are then units aware.
547547

548548
`Text Extensions for Pandas`_
549549
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
550550

551-
``Text Extensions for Pandas <https://ibm.biz/text-extensions-for-pandas>``
551+
`Text Extensions for Pandas <https://ibm.biz/text-extensions-for-pandas>`_
552552
provides extension types to cover common data structures for representing natural language
553553
data, plus library integrations that convert the outputs of popular natural language
554554
processing libraries into Pandas DataFrames.

doc/source/reference/groupby.rst

-2
Original file line numberDiff line numberDiff line change
@@ -132,9 +132,7 @@ The following methods are available only for ``SeriesGroupBy`` objects.
132132
SeriesGroupBy.hist
133133
SeriesGroupBy.nlargest
134134
SeriesGroupBy.nsmallest
135-
SeriesGroupBy.nunique
136135
SeriesGroupBy.unique
137-
SeriesGroupBy.value_counts
138136
SeriesGroupBy.is_monotonic_increasing
139137
SeriesGroupBy.is_monotonic_decreasing
140138

doc/source/reference/series.rst

+1
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,7 @@ Datetime methods
342342
:toctree: api/
343343
:template: autosummary/accessor_method.rst
344344

345+
Series.dt.isocalendar
345346
Series.dt.to_period
346347
Series.dt.to_pydatetime
347348
Series.dt.tz_localize

doc/source/user_guide/advanced.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -1082,14 +1082,14 @@ of :ref:`frequency aliases <timeseries.offset_aliases>` with datetime-like inter
10821082
10831083
pd.interval_range(start=pd.Timedelta("0 days"), periods=3, freq="9H")
10841084
1085-
Additionally, the ``closed`` parameter can be used to specify which side(s) the intervals
1086-
are closed on. Intervals are closed on the right side by default.
1085+
Additionally, the ``inclusive`` parameter can be used to specify which side(s) the intervals
1086+
are closed on. Intervals are closed on the both side by default.
10871087

10881088
.. ipython:: python
10891089
1090-
pd.interval_range(start=0, end=4, closed="both")
1090+
pd.interval_range(start=0, end=4, inclusive="both")
10911091
1092-
pd.interval_range(start=0, end=4, closed="neither")
1092+
pd.interval_range(start=0, end=4, inclusive="neither")
10931093
10941094
Specifying ``start``, ``end``, and ``periods`` will generate a range of evenly spaced
10951095
intervals from ``start`` to ``end`` inclusively, with ``periods`` number of elements

doc/source/user_guide/cookbook.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,7 @@ Fill forward a reversed timeseries
423423
)
424424
df.loc[df.index[3], "A"] = np.nan
425425
df
426-
df.reindex(df.index[::-1]).ffill()
426+
df.bfill()
427427
428428
`cumsum reset at NaN values
429429
<https://stackoverflow.com/questions/18196811/cumsum-reset-at-nan>`__

doc/source/user_guide/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -678,7 +678,7 @@ Boolean operators operate element-wise as well:
678678
Transposing
679679
~~~~~~~~~~~
680680

681-
To transpose, access the ``T`` attribute or :meth:`DataFrame.transpose``,
681+
To transpose, access the ``T`` attribute or :meth:`DataFrame.transpose`,
682682
similar to an ndarray:
683683

684684
.. ipython:: python

doc/source/user_guide/groupby.rst

+52-25
Original file line numberDiff line numberDiff line change
@@ -539,19 +539,19 @@ Some common aggregating functions are tabulated below:
539539
:widths: 20, 80
540540
:delim: ;
541541

542-
:meth:`~pd.core.groupby.DataFrameGroupBy.mean`;Compute mean of groups
543-
:meth:`~pd.core.groupby.DataFrameGroupBy.sum`;Compute sum of group values
544-
:meth:`~pd.core.groupby.DataFrameGroupBy.size`;Compute group sizes
545-
:meth:`~pd.core.groupby.DataFrameGroupBy.count`;Compute count of group
546-
:meth:`~pd.core.groupby.DataFrameGroupBy.std`;Standard deviation of groups
547-
:meth:`~pd.core.groupby.DataFrameGroupBy.var`;Compute variance of groups
548-
:meth:`~pd.core.groupby.DataFrameGroupBy.sem`;Standard error of the mean of groups
549-
:meth:`~pd.core.groupby.DataFrameGroupBy.describe`;Generates descriptive statistics
550-
:meth:`~pd.core.groupby.DataFrameGroupBy.first`;Compute first of group values
551-
:meth:`~pd.core.groupby.DataFrameGroupBy.last`;Compute last of group values
552-
:meth:`~pd.core.groupby.DataFrameGroupBy.nth`;Take nth value, or a subset if n is a list
553-
:meth:`~pd.core.groupby.DataFrameGroupBy.min`;Compute min of group values
554-
:meth:`~pd.core.groupby.DataFrameGroupBy.max`;Compute max of group values
542+
:meth:`~pd.core.groupby.DataFrameGroupBy.mean`;Compute mean of groups
543+
:meth:`~pd.core.groupby.DataFrameGroupBy.sum`;Compute sum of group values
544+
:meth:`~pd.core.groupby.DataFrameGroupBy.size`;Compute group sizes
545+
:meth:`~pd.core.groupby.DataFrameGroupBy.count`;Compute count of group
546+
:meth:`~pd.core.groupby.DataFrameGroupBy.std`;Standard deviation of groups
547+
:meth:`~pd.core.groupby.DataFrameGroupBy.var`;Compute variance of groups
548+
:meth:`~pd.core.groupby.DataFrameGroupBy.sem`;Standard error of the mean of groups
549+
:meth:`~pd.core.groupby.DataFrameGroupBy.describe`;Generates descriptive statistics
550+
:meth:`~pd.core.groupby.DataFrameGroupBy.first`;Compute first of group values
551+
:meth:`~pd.core.groupby.DataFrameGroupBy.last`;Compute last of group values
552+
:meth:`~pd.core.groupby.DataFrameGroupBy.nth`;Take nth value, or a subset if n is a list
553+
:meth:`~pd.core.groupby.DataFrameGroupBy.min`;Compute min of group values
554+
:meth:`~pd.core.groupby.DataFrameGroupBy.max`;Compute max of group values
555555

556556

557557
The aggregating functions above will exclude NA values. Any function which
@@ -1052,7 +1052,14 @@ Some operations on the grouped data might not fit into either the aggregate or
10521052
transform categories. Or, you may simply want GroupBy to infer how to combine
10531053
the results. For these, use the ``apply`` function, which can be substituted
10541054
for both ``aggregate`` and ``transform`` in many standard use cases. However,
1055-
``apply`` can handle some exceptional use cases, for example:
1055+
``apply`` can handle some exceptional use cases.
1056+
1057+
.. note::
1058+
1059+
``apply`` can act as a reducer, transformer, *or* filter function, depending
1060+
on exactly what is passed to it. It can depend on the passed function and
1061+
exactly what you are grouping. Thus the grouped column(s) may be included in
1062+
the output as well as set the indices.
10561063

10571064
.. ipython:: python
10581065
@@ -1064,16 +1071,14 @@ for both ``aggregate`` and ``transform`` in many standard use cases. However,
10641071
10651072
The dimension of the returned result can also change:
10661073

1067-
.. ipython::
1068-
1069-
In [8]: grouped = df.groupby('A')['C']
1074+
.. ipython:: python
10701075
1071-
In [10]: def f(group):
1072-
....: return pd.DataFrame({'original': group,
1073-
....: 'demeaned': group - group.mean()})
1074-
....:
1076+
grouped = df.groupby('A')['C']
10751077
1076-
In [11]: grouped.apply(f)
1078+
def f(group):
1079+
return pd.DataFrame({'original': group,
1080+
'demeaned': group - group.mean()})
1081+
grouped.apply(f)
10771082
10781083
``apply`` on a Series can operate on a returned value from the applied function,
10791084
that is itself a series, and possibly upcast the result to a DataFrame:
@@ -1088,11 +1093,33 @@ that is itself a series, and possibly upcast the result to a DataFrame:
10881093
s
10891094
s.apply(f)
10901095
1096+
Control grouped column(s) placement with ``group_keys``
1097+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1098+
10911099
.. note::
10921100

1093-
``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to it.
1094-
So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in
1095-
the output as well as set the indices.
1101+
If ``group_keys=True`` is specified when calling :meth:`~DataFrame.groupby`,
1102+
functions passed to ``apply`` that return like-indexed outputs will have the
1103+
group keys added to the result index. Previous versions of pandas would add
1104+
the group keys only when the result from the applied function had a different
1105+
index than the input. If ``group_keys`` is not specified, the group keys will
1106+
not be added for like-indexed outputs. In the future this behavior
1107+
will change to always respect ``group_keys``, which defaults to ``True``.
1108+
1109+
.. versionchanged:: 1.5.0
1110+
1111+
To control whether the grouped column(s) are included in the indices, you can use
1112+
the argument ``group_keys``. Compare
1113+
1114+
.. ipython:: python
1115+
1116+
df.groupby("A", group_keys=True).apply(lambda x: x)
1117+
1118+
with
1119+
1120+
.. ipython:: python
1121+
1122+
df.groupby("A", group_keys=False).apply(lambda x: x)
10961123
10971124
Similar to :ref:`groupby.aggregate.udfs`, the resulting dtype will reflect that of the
10981125
apply function. If the results from different groups have different dtypes, then

doc/source/user_guide/io.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -5695,9 +5695,9 @@ for an explanation of how the database connection is handled.
56955695
56965696
.. warning::
56975697

5698-
When you open a connection to a database you are also responsible for closing it.
5699-
Side effects of leaving a connection open may include locking the database or
5700-
other breaking behaviour.
5698+
When you open a connection to a database you are also responsible for closing it.
5699+
Side effects of leaving a connection open may include locking the database or
5700+
other breaking behaviour.
57015701

57025702
Writing DataFrames
57035703
''''''''''''''''''

doc/source/user_guide/timeseries.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -2405,9 +2405,9 @@ you can use the ``tz_convert`` method.
24052405
24062406
.. warning::
24072407

2408-
Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different
2409-
definitions of the zone. This is more of a problem for unusual time zones than for
2410-
'standard' zones like ``US/Eastern``.
2408+
Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different
2409+
definitions of the zone. This is more of a problem for unusual time zones than for
2410+
'standard' zones like ``US/Eastern``.
24112411

24122412
.. warning::
24132413

doc/source/user_guide/window.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -624,13 +624,13 @@ average of ``3, NaN, 5`` would be calculated as
624624

625625
.. math::
626626
627-
\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}.
627+
\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}.
628628
629629
Whereas if ``ignore_na=True``, the weighted average would be calculated as
630630

631631
.. math::
632632
633-
\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.
633+
\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.
634634
635635
The :meth:`~Ewm.var`, :meth:`~Ewm.std`, and :meth:`~Ewm.cov` functions have a ``bias`` argument,
636636
specifying whether the result should contain biased or unbiased statistics.

doc/source/whatsnew/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Version 1.4
2424
.. toctree::
2525
:maxdepth: 2
2626

27+
v1.4.3
2728
v1.4.2
2829
v1.4.1
2930
v1.4.0

doc/source/whatsnew/v0.15.0.rst

+24-24
Original file line numberDiff line numberDiff line change
@@ -462,15 +462,15 @@ Rolling/expanding moments improvements
462462
463463
.. code-block:: ipython
464464
465-
In [51]: ewma(s, com=3., min_periods=2)
466-
Out[51]:
467-
0 NaN
468-
1 NaN
469-
2 1.000000
470-
3 1.000000
471-
4 1.571429
472-
5 2.189189
473-
dtype: float64
465+
In [51]: pd.ewma(s, com=3., min_periods=2)
466+
Out[51]:
467+
0 NaN
468+
1 NaN
469+
2 1.000000
470+
3 1.000000
471+
4 1.571429
472+
5 2.189189
473+
dtype: float64
474474
475475
New behavior (note values start at index ``4``, the location of the 2nd (since ``min_periods=2``) non-empty value):
476476

@@ -557,21 +557,21 @@ Rolling/expanding moments improvements
557557
558558
.. code-block:: ipython
559559
560-
In [89]: ewmvar(s, com=2., bias=False)
561-
Out[89]:
562-
0 -2.775558e-16
563-
1 3.000000e-01
564-
2 9.556787e-01
565-
3 3.585799e+00
566-
dtype: float64
567-
568-
In [90]: ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
569-
Out[90]:
570-
0 1.25
571-
1 1.25
572-
2 1.25
573-
3 1.25
574-
dtype: float64
560+
In [89]: pd.ewmvar(s, com=2., bias=False)
561+
Out[89]:
562+
0 -2.775558e-16
563+
1 3.000000e-01
564+
2 9.556787e-01
565+
3 3.585799e+00
566+
dtype: float64
567+
568+
In [90]: pd.ewmvar(s, com=2., bias=False) / pd.ewmvar(s, com=2., bias=True)
569+
Out[90]:
570+
0 1.25
571+
1 1.25
572+
2 1.25
573+
3 1.25
574+
dtype: float64
575575
576576
Note that entry ``0`` is approximately 0, and the debiasing factors are a constant 1.25.
577577
By comparison, the following 0.15.0 results have a ``NaN`` for entry ``0``,

doc/source/whatsnew/v0.18.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,8 @@ can return a valid boolean indexer or anything which is valid for these indexer'
149149
# callable returns list of labels
150150
df.loc[lambda x: [1, 2], lambda x: ["A", "B"]]
151151
152-
Indexing with``[]``
153-
"""""""""""""""""""
152+
Indexing with ``[]``
153+
""""""""""""""""""""
154154

155155
Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
156156
The callable must return a valid input for ``[]`` indexing depending on its

doc/source/whatsnew/v0.19.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1553,7 +1553,7 @@ Bug fixes
15531553
- Bug in invalid datetime parsing in ``to_datetime`` and ``DatetimeIndex`` may raise ``TypeError`` rather than ``ValueError`` (:issue:`11169`, :issue:`11287`)
15541554
- Bug in ``Index`` created with tz-aware ``Timestamp`` and mismatched ``tz`` option incorrectly coerces timezone (:issue:`13692`)
15551555
- Bug in ``DatetimeIndex`` with nanosecond frequency does not include timestamp specified with ``end`` (:issue:`13672`)
1556-
- Bug in ```Series`` when setting a slice with a ``np.timedelta64`` (:issue:`14155`)
1556+
- Bug in ``Series`` when setting a slice with a ``np.timedelta64`` (:issue:`14155`)
15571557
- Bug in ``Index`` raises ``OutOfBoundsDatetime`` if ``datetime`` exceeds ``datetime64[ns]`` bounds, rather than coercing to ``object`` dtype (:issue:`13663`)
15581558
- Bug in ``Index`` may ignore specified ``datetime64`` or ``timedelta64`` passed as ``dtype`` (:issue:`13981`)
15591559
- Bug in ``RangeIndex`` can be created without no arguments rather than raises ``TypeError`` (:issue:`13793`)

0 commit comments

Comments
 (0)