Skip to content

Commit b1521ea

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents e6fc01a + 8980af7 commit b1521ea

File tree

98 files changed

+1583
-1276
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+1583
-1276
lines changed

asv_bench/benchmarks/tslibs/tz_convert.py

+7-3
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,14 @@
1111

1212
try:
1313
old_sig = False
14-
from pandas._libs.tslibs.tzconversion import tz_convert_from_utc
14+
from pandas._libs.tslibs import tz_convert_from_utc
1515
except ImportError:
16-
old_sig = True
17-
from pandas._libs.tslibs.tzconversion import tz_convert as tz_convert_from_utc
16+
try:
17+
old_sig = False
18+
from pandas._libs.tslibs.tzconversion import tz_convert_from_utc
19+
except ImportError:
20+
old_sig = True
21+
from pandas._libs.tslibs.tzconversion import tz_convert as tz_convert_from_utc
1822

1923

2024
class TimeTZConvert:

azure-pipelines.yml

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ jobs:
4848
pip install cython numpy python-dateutil pytz pytest pytest-xdist pytest-asyncio>=0.17 hypothesis && \
4949
python setup.py build_ext -q -j2 && \
5050
python -m pip install --no-build-isolation -e . && \
51+
export PANDAS_CI=1 && \
5152
pytest -m 'not slow and not network and not clipboard and not single_cpu' pandas --junitxml=test-data.xml"
5253
displayName: 'Run 32-bit manylinux2014 Docker Build / Tests'
5354

doc/source/getting_started/intro_tutorials/04_plotting.rst

+22-7
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
{{ header }}
44

5+
How do I create plots in pandas?
6+
----------------------------------
7+
8+
.. image:: ../../_static/schemas/04_plot_overview.svg
9+
:align: center
10+
511
.. ipython:: python
612
713
import pandas as pd
@@ -35,12 +41,6 @@
3541
</ul>
3642
</div>
3743

38-
How to create plots in pandas?
39-
------------------------------
40-
41-
.. image:: ../../_static/schemas/04_plot_overview.svg
42-
:align: center
43-
4444
.. raw:: html
4545

4646
<ul class="task-bullet">
@@ -52,6 +52,7 @@ I want a quick visual check of the data.
5252
5353
@savefig 04_airqual_quick.png
5454
air_quality.plot()
55+
plt.show()
5556
5657
With a ``DataFrame``, pandas creates by default one line plot for each of
5758
the columns with numeric data.
@@ -68,10 +69,19 @@ the columns with numeric data.
6869

6970
I want to plot only the columns of the data table with the data from Paris.
7071

72+
.. ipython:: python
73+
:suppress:
74+
75+
# We need to clear the figure here as, within doc generation, the plot
76+
# accumulates data on each plot(). This is not needed when running
77+
# in a notebook, so is suppressed from output.
78+
plt.clf()
79+
7180
.. ipython:: python
7281
7382
@savefig 04_airqual_paris.png
7483
air_quality["station_paris"].plot()
84+
plt.show()
7585
7686
To plot a specific column, use the selection method of the
7787
:ref:`subset data tutorial <10min_tut_03_subset>` in combination with the :meth:`~DataFrame.plot`
@@ -94,6 +104,7 @@ I want to visually compare the :math:`NO_2` values measured in London versus Par
94104
95105
@savefig 04_airqual_scatter.png
96106
air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5)
107+
plt.show()
97108
98109
.. raw:: html
99110

@@ -125,6 +136,7 @@ method is applicable on the air quality example data:
125136
126137
@savefig 04_airqual_boxplot.png
127138
air_quality.plot.box()
139+
plt.show()
128140
129141
.. raw:: html
130142

@@ -148,6 +160,7 @@ I want each of the columns in a separate subplot.
148160
149161
@savefig 04_airqual_area_subplot.png
150162
axs = air_quality.plot.area(figsize=(12, 4), subplots=True)
163+
plt.show()
151164
152165
Separate subplots for each of the data columns are supported by the ``subplots`` argument
153166
of the ``plot`` functions. The builtin options available in each of the pandas plot
@@ -180,9 +193,10 @@ I want to further customize, extend or save the resulting plot.
180193
181194
fig, axs = plt.subplots(figsize=(12, 4))
182195
air_quality.plot.area(ax=axs)
183-
@savefig 04_airqual_customized.png
184196
axs.set_ylabel("NO$_2$ concentration")
197+
@savefig 04_airqual_customized.png
185198
fig.savefig("no2_concentrations.png")
199+
plt.show()
186200
187201
.. ipython:: python
188202
:suppress:
@@ -208,6 +222,7 @@ This strategy is applied in the previous example:
208222
air_quality.plot.area(ax=axs) # Use pandas to put the area plot on the prepared Figure/Axes
209223
axs.set_ylabel("NO$_2$ concentration") # Do any Matplotlib customization you like
210224
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing Matplotlib method.
225+
plt.show() # Display the plot
211226

212227
.. raw:: html
213228

doc/source/whatsnew/v1.5.0.rst

+12-2
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@ Other enhancements
9595
- :meth:`pd.concat` now raises when ``levels`` is given but ``keys`` is None (:issue:`46653`)
9696
- :meth:`pd.concat` now raises when ``levels`` contains duplicate values (:issue:`46653`)
9797
- Added ``numeric_only`` argument to :meth:`DataFrame.corr`, :meth:`DataFrame.corrwith`, and :meth:`DataFrame.cov` (:issue:`46560`)
98+
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`)
99+
- Added ``numeric_only`` argument to :meth:`Resampler.sum`, :meth:`Resampler.prod`, :meth:`Resampler.min`, :meth:`Resampler.max`, :meth:`Resampler.first`, and :meth:`Resampler.last` (:issue:`46442`)
98100

99101
.. ---------------------------------------------------------------------------
100102
.. _whatsnew_150.notable_bug_fixes:
@@ -478,10 +480,12 @@ Datetimelike
478480
Timedelta
479481
^^^^^^^^^
480482
- Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`)
483+
- Bug in constructing a :class:`Timedelta` with a ``np.timedelta64`` object and a ``unit`` sometimes silently overflowing and returning incorrect results instead of raising ``OutOfBoundsTimedelta`` (:issue:`46827`)
484+
-
481485

482486
Time Zones
483487
^^^^^^^^^^
484-
-
488+
- Bug in :class:`Timestamp` constructor raising when passed a ``ZoneInfo`` tzinfo object (:issue:`46425`)
485489
-
486490

487491
Numeric
@@ -500,6 +504,7 @@ Conversion
500504
- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`)
501505
- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`)
502506
- Bug in metaclass of generic abstract dtypes causing :meth:`DataFrame.apply` and :meth:`Series.apply` to raise for the built-in function ``type`` (:issue:`46684`)
507+
- Bug in :meth:`DataFrame.to_dict` for ``orient="list"`` or ``orient="index"`` was not returning native types (:issue:`46751`)
503508

504509
Strings
505510
^^^^^^^
@@ -569,6 +574,8 @@ I/O
569574
- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`)
570575
- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`)
571576
- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`)
577+
- :meth:`to_html` now excludes the ``border`` attribute from ``<table>`` elements when ``border`` keyword is set to ``False``.
578+
-
572579

573580
Period
574581
^^^^^^
@@ -599,7 +606,10 @@ Groupby/resample/rolling
599606
- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`)
600607
- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`)
601608
- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`)
602-
-
609+
- Bug in :meth:`SeriesGroupBy.apply` would incorrectly name its result when there was a unique group (:issue:`46369`)
610+
- Bug in :meth:`Rolling.var` and :meth:`Rolling.std` would give non-zero result with window of same values (:issue:`42064`)
611+
- Bug in :meth:`.Rolling.var` would segfault calculating weighted variance when window size was larger than data size (:issue:`46760`)
612+
- Bug in :meth:`Grouper.__repr__` where ``dropna`` was not included. Now it is (:issue:`46754`)
603613

604614
Reshaping
605615
^^^^^^^^^

pandas/_libs/algos.pyx

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
import cython
2-
from cython import Py_ssize_t
3-
1+
cimport cython
2+
from cython cimport Py_ssize_t
43
from libc.math cimport (
54
fabs,
65
sqrt,

pandas/_libs/algos_common_helper.pxi.in

+2-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,8 @@ def ensure_{{name}}(object arr, copy=True):
6565
if (<ndarray>arr).descr.type_num == NPY_{{c_type}}:
6666
return arr
6767
else:
68-
return arr.astype(np.{{dtype}}, copy=copy)
68+
# equiv: arr.astype(np.{{dtype}}, copy=copy)
69+
return cnp.PyArray_Cast(<ndarray>arr, cnp.NPY_{{c_type}})
6970
else:
7071
return np.array(arr, dtype=np.{{dtype}})
7172

pandas/_libs/groupby.pyx

+12-23
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
import cython
2-
from cython import Py_ssize_t
3-
4-
from cython cimport floating
1+
cimport cython
2+
from cython cimport (
3+
Py_ssize_t,
4+
floating,
5+
)
56
from libc.stdlib cimport (
67
free,
78
malloc,
@@ -268,7 +269,6 @@ def group_cumsum(
268269
out[i, j] = na_val
269270
continue
270271

271-
272272
if isna_entry:
273273
out[i, j] = na_val
274274
if not skipna:
@@ -1220,7 +1220,7 @@ def group_nth(
12201220
if nobs[lab, j] == rank:
12211221
resx[lab, j] = val
12221222

1223-
# TODO: de-dup this whoel block with group_last?
1223+
# TODO: de-dup this whole block with group_last?
12241224
for i in range(ncounts):
12251225
for j in range(K):
12261226
if nobs[i, j] < min_count:
@@ -1232,6 +1232,9 @@ def group_nth(
12321232
# set a placeholder value in out[i, j].
12331233
if uses_mask:
12341234
result_mask[i, j] = True
1235+
# set out[i, j] to 0 to be deterministic, as
1236+
# it was initialized with np.empty. Also ensures
1237+
# we can downcast out if appropriate.
12351238
out[i, j] = 0
12361239
elif numeric_object_t is float32_t or numeric_object_t is float64_t:
12371240
out[i, j] = NAN
@@ -1369,7 +1372,7 @@ cdef group_min_max(
13691372
"""
13701373
cdef:
13711374
Py_ssize_t i, j, N, K, lab, ngroups = len(counts)
1372-
numeric_t val, nan_val
1375+
numeric_t val
13731376
ndarray[numeric_t, ndim=2] group_min_or_max
13741377
int64_t[:, ::1] nobs
13751378
bint uses_mask = mask is not None
@@ -1386,20 +1389,6 @@ cdef group_min_max(
13861389
group_min_or_max = np.empty_like(out)
13871390
group_min_or_max[:] = _get_min_or_max(<numeric_t>0, compute_max, is_datetimelike)
13881391

1389-
# NB: We do not define nan_val because there is no such thing
1390-
# for uint64_t. We carefully avoid having to reference it in this
1391-
# case.
1392-
if numeric_t is int64_t:
1393-
nan_val = NPY_NAT
1394-
elif numeric_t is int32_t:
1395-
nan_val = util.INT32_MIN
1396-
elif numeric_t is int16_t:
1397-
nan_val = util.INT16_MIN
1398-
elif numeric_t is int8_t:
1399-
nan_val = util.INT8_MIN
1400-
elif numeric_t is float64_t or numeric_t is float32_t:
1401-
nan_val = NAN
1402-
14031392
N, K = (<object>values).shape
14041393

14051394
with nogil:
@@ -1442,11 +1431,11 @@ cdef group_min_max(
14421431
# we can downcast out if appropriate.
14431432
out[i, j] = 0
14441433
elif numeric_t is float32_t or numeric_t is float64_t:
1445-
out[i, j] = nan_val
1434+
out[i, j] = NAN
14461435
elif numeric_t is int64_t:
14471436
# Per above, this is a placeholder in
14481437
# non-is_datetimelike cases.
1449-
out[i, j] = nan_val
1438+
out[i, j] = NPY_NAT
14501439
else:
14511440
# placeholder, see above
14521441
out[i, j] = 0

pandas/_libs/hashing.pyx

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
# Translated from the reference implementation
22
# at https://github.com/veorq/SipHash
33

4-
import cython
5-
4+
cimport cython
65
from libc.stdlib cimport (
76
free,
87
malloc,

pandas/_libs/internals.pyx

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
from collections import defaultdict
22

3-
import cython
4-
from cython import Py_ssize_t
5-
3+
cimport cython
64
from cpython.slice cimport PySlice_GetIndicesEx
5+
from cython cimport Py_ssize_t
76

87

98
cdef extern from "Python.h":

pandas/_libs/interval.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ from cpython.datetime cimport (
1111

1212
import_datetime()
1313

14+
cimport cython
1415
from cpython.object cimport (
1516
Py_EQ,
1617
Py_GE,
@@ -20,9 +21,8 @@ from cpython.object cimport (
2021
Py_NE,
2122
PyObject_RichCompare,
2223
)
24+
from cython cimport Py_ssize_t
2325

24-
import cython
25-
from cython import Py_ssize_t
2626
import numpy as np
2727

2828
cimport numpy as cnp

pandas/_libs/join.pyx

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
import cython
2-
from cython import Py_ssize_t
1+
cimport cython
2+
from cython cimport Py_ssize_t
33
import numpy as np
44

55
cimport numpy as cnp
@@ -233,6 +233,8 @@ cdef void _get_result_indexer(intp_t[::1] sorter, intp_t[::1] indexer) nogil:
233233
indexer[:] = -1
234234

235235

236+
@cython.wraparound(False)
237+
@cython.boundscheck(False)
236238
def ffill_indexer(const intp_t[:] indexer) -> np.ndarray:
237239
cdef:
238240
Py_ssize_t i, n = len(indexer)

pandas/_libs/lib.pyx

+5-4
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,7 @@ from decimal import Decimal
33
from enum import Enum
44
import warnings
55

6-
import cython
7-
from cython import Py_ssize_t
8-
6+
cimport cython
97
from cpython.datetime cimport (
108
PyDate_Check,
119
PyDateTime_Check,
@@ -25,7 +23,10 @@ from cpython.tuple cimport (
2523
PyTuple_New,
2624
PyTuple_SET_ITEM,
2725
)
28-
from cython cimport floating
26+
from cython cimport (
27+
Py_ssize_t,
28+
floating,
29+
)
2930

3031
import_datetime()
3132

pandas/_libs/missing.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ from decimal import Decimal
22
import numbers
33
from sys import maxsize
44

5-
import cython
6-
from cython import Py_ssize_t
5+
cimport cython
6+
from cython cimport Py_ssize_t
77
import numpy as np
88

99
cimport numpy as cnp

pandas/_libs/ops.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import operator
22

3+
cimport cython
34
from cpython.object cimport (
45
Py_EQ,
56
Py_GE,
@@ -9,9 +10,8 @@ from cpython.object cimport (
910
Py_NE,
1011
PyObject_RichCompareBool,
1112
)
13+
from cython cimport Py_ssize_t
1214

13-
import cython
14-
from cython import Py_ssize_t
1515
import numpy as np
1616

1717
from numpy cimport (

0 commit comments

Comments
 (0)