Skip to content

Commit f983dce

Browse files
authored
Merge branch 'main' into bug-tznaive-utc
2 parents fe90bb3 + 41942e1 commit f983dce

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+3630
-497
lines changed

doc/source/development/policies.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ pandas may change the behavior of experimental features at any time.
5151
Python support
5252
~~~~~~~~~~~~~~
5353

54-
pandas will only drop support for specific Python versions (e.g. 3.6.x, 3.7.x) in
55-
pandas **major** or **minor** releases.
54+
pandas mirrors the `NumPy guidelines for Python support <https://numpy.org/neps/nep-0029-deprecation_policy.html#implementation>`__.
55+
5656

5757
.. _SemVer: https://semver.org

doc/source/reference/frame.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,3 +391,4 @@ Serialization / IO / conversion
391391
DataFrame.to_clipboard
392392
DataFrame.to_markdown
393393
DataFrame.style
394+
DataFrame.__dataframe__

doc/source/reference/general_functions.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,10 @@ Hashing
7878

7979
util.hash_array
8080
util.hash_pandas_object
81+
82+
Importing from other DataFrame libraries
83+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84+
.. autosummary::
85+
:toctree: api/
86+
87+
api.exchange.from_dataframe

doc/source/user_guide/gotchas.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ integer arrays to floating when NAs must be introduced.
367367
Differences with NumPy
368368
----------------------
369369
For :class:`Series` and :class:`DataFrame` objects, :meth:`~DataFrame.var` normalizes by
370-
``N-1`` to produce unbiased estimates of the sample variance, while NumPy's
370+
``N-1`` to produce `unbiased estimates of the population variance <https://en.wikipedia.org/wiki/Bias_of_an_estimator>`__, while NumPy's
371371
:meth:`numpy.var` normalizes by N, which measures the variance of the sample. Note that
372372
:meth:`~DataFrame.cov` normalizes by ``N-1`` in both pandas and NumPy.
373373

doc/source/whatsnew/v1.5.0.rst

Lines changed: 39 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,40 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17+
.. _whatsnew_150.enhancements.dataframe_exchange:
18+
19+
DataFrame exchange protocol implementation
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
Pandas now implement the DataFrame exchange API spec.
23+
See the full details on the API at https://data-apis.org/dataframe-protocol/latest/index.html
24+
25+
The protocol consists of two parts:
26+
27+
- New method :meth:`DataFrame.__dataframe__` which produces the exchange object.
28+
It effectively "exports" the Pandas dataframe as an exchange object so
29+
any other library which has the protocol implemented can "import" that dataframe
30+
without knowing anything about the producer except that it makes an exchange object.
31+
- New function :func:`pandas.api.exchange.from_dataframe` which can take
32+
an arbitrary exchange object from any conformant library and construct a
33+
Pandas DataFrame out of it.
34+
1735
.. _whatsnew_150.enhancements.styler:
1836

1937
Styler
2038
^^^^^^
2139

22-
- New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`)
23-
- Added the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`)
24-
- Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`)
25-
- :meth:`.Styler.highlight_null` now accepts ``color`` consistently with other builtin methods and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`)
40+
The most notable development is the new method :meth:`.Styler.concat` which
41+
allows adding customised footer rows to visualise additional calculations on the data,
42+
e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`)
43+
44+
Additionally there is an alternative output method :meth:`.Styler.to_string`,
45+
which allows using the Styler's formatting methods to create, for example, CSVs (:issue:`44502`).
46+
47+
Minor feature improvements are:
48+
49+
- Adding the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`)
50+
- Making keyword arguments consist: :meth:`.Styler.highlight_null` now accepts ``color`` and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`)
2651

2752
.. _whatsnew_150.enhancements.resample_group_keys:
2853

@@ -79,6 +104,7 @@ as seen in the following example.
79104

80105
Other enhancements
81106
^^^^^^^^^^^^^^^^^^
107+
- :meth:`Series.map` now raises when ``arg`` is dict but ``na_action`` is not either ``None`` or ``'ignore'`` (:issue:`46588`)
82108
- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`)
83109
- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`)
84110
- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`)
@@ -94,7 +120,9 @@ Other enhancements
94120
- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`)
95121
- :meth:`pd.concat` now raises when ``levels`` is given but ``keys`` is None (:issue:`46653`)
96122
- :meth:`pd.concat` now raises when ``levels`` contains duplicate values (:issue:`46653`)
97-
- Added ``numeric_only`` argument to :meth:`DataFrame.corr`, :meth:`DataFrame.corrwith`, and :meth:`DataFrame.cov` (:issue:`46560`)
123+
- Added ``numeric_only`` argument to :meth:`DataFrame.corr`, :meth:`DataFrame.corrwith`, :meth:`DataFrame.cov`, :meth:`DataFrame.idxmin`, :meth:`DataFrame.idxmax`, :meth:`.GroupBy.idxmin`, :meth:`.GroupBy.idxmax`, :meth:`.GroupBy.var`, :meth:`.GroupBy.std`, :meth:`.GroupBy.sem`, and :meth:`.GroupBy.quantile` (:issue:`46560`)
124+
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`, :issue:`46725`)
125+
- Added ``validate`` argument to :meth:`DataFrame.join` (:issue:`46622`)
98126
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`)
99127
- Added ``numeric_only`` argument to :meth:`Resampler.sum`, :meth:`Resampler.prod`, :meth:`Resampler.min`, :meth:`Resampler.max`, :meth:`Resampler.first`, and :meth:`Resampler.last` (:issue:`46442`)
100128

@@ -106,13 +134,6 @@ Notable bug fixes
106134

107135
These are bug fixes that might have notable behavior changes.
108136

109-
.. _whatsnew_150.notable_bug_fixes.notable_bug_fix1:
110-
111-
Styler
112-
^^^^^^
113-
114-
- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`)
115-
116137
.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna:
117138

118139
Using ``dropna=True`` with ``groupby`` transforms
@@ -173,13 +194,6 @@ did not have the same index as the input.
173194
df.groupby('a', dropna=True).transform('ffill')
174195
df.groupby('a', dropna=True).transform(lambda x: x)
175196
176-
.. _whatsnew_150.notable_bug_fixes.visualization:
177-
178-
Styler
179-
^^^^^^
180-
181-
- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`)
182-
183197
.. _whatsnew_150.notable_bug_fixes.to_json_incorrectly_localizing_naive_timestamps:
184198

185199
Serializing tz-naive Timestamps with to_json() with ``iso_dates=True``
@@ -587,7 +601,7 @@ Missing
587601
- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`)
588602
- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`)
589603
- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`)
590-
-
604+
- Bug in :meth:`DataFrame.dropna` allows to set both ``how`` and ``thresh`` incompatible arguments (:issue:`46575`)
591605

592606
MultiIndex
593607
^^^^^^^^^^
@@ -619,6 +633,8 @@ Period
619633
^^^^^^
620634
- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`)
621635
- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, directives ``%l`` and ``%u`` were giving wrong results (:issue:`46252`)
636+
- Bug in inferring an incorrect ``freq`` when passing a string to :class:`Period` microseconds that are a multiple of 1000 (:issue:`46811`)
637+
- Bug in constructing a :class:`Period` from a :class:`Timestamp` or ``np.datetime64`` object with non-zero nanoseconds and ``freq="ns"`` incorrectly truncating the nanoseconds (:issue:`46811`)
622638
-
623639

624640
Plotting
@@ -629,6 +645,7 @@ Plotting
629645
- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`)
630646
- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`)
631647
- The function :meth:`DataFrame.plot.scatter` now accepts ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` for consistency to other plotting functions (:issue:`44670`)
648+
- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`)
632649

633650
Groupby/resample/rolling
634651
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -645,6 +662,7 @@ Groupby/resample/rolling
645662
- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`)
646663
- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`)
647664
- Bug in :meth:`SeriesGroupBy.apply` would incorrectly name its result when there was a unique group (:issue:`46369`)
665+
- Bug in :meth:`Rolling.sum` and :meth:`Rolling.mean` would give incorrect result with window of same values (:issue:`42064`, :issue:`46431`)
648666
- Bug in :meth:`Rolling.var` and :meth:`Rolling.std` would give non-zero result with window of same values (:issue:`42064`)
649667
- Bug in :meth:`.Rolling.var` would segfault calculating weighted variance when window size was larger than data size (:issue:`46760`)
650668
- Bug in :meth:`Grouper.__repr__` where ``dropna`` was not included. Now it is (:issue:`46754`)
@@ -672,7 +690,7 @@ ExtensionArray
672690
Styler
673691
^^^^^^
674692
- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`)
675-
-
693+
- Bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`)
676694

677695
Metadata
678696
^^^^^^^^

pandas/_libs/hashtable.pyi

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,10 +197,13 @@ def duplicated(
197197
values: np.ndarray,
198198
keep: Literal["last", "first", False] = ...,
199199
) -> npt.NDArray[np.bool_]: ...
200-
def mode(values: np.ndarray, dropna: bool) -> np.ndarray: ...
200+
def mode(
201+
values: np.ndarray, dropna: bool, mask: npt.NDArray[np.bool_] | None = None
202+
) -> np.ndarray: ...
201203
def value_count(
202204
values: np.ndarray,
203205
dropna: bool,
206+
mask: npt.NDArray[np.bool_] | None = None,
204207
) -> tuple[np.ndarray, npt.NDArray[np.int64],]: ... # np.ndarray[same-as-values]
205208

206209
# arr and values should have same dtype

pandas/_libs/hashtable_func_helper.pxi.in

Lines changed: 33 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@ dtypes = [('Complex128', 'complex128', 'complex128',
3131
@cython.wraparound(False)
3232
@cython.boundscheck(False)
3333
{{if dtype == 'object'}}
34-
cdef value_count_{{dtype}}(ndarray[{{dtype}}] values, bint dropna):
34+
cdef value_count_{{dtype}}(ndarray[{{dtype}}] values, bint dropna, const uint8_t[:] mask=None):
3535
{{else}}
36-
cdef value_count_{{dtype}}(const {{dtype}}_t[:] values, bint dropna):
36+
cdef value_count_{{dtype}}(const {{dtype}}_t[:] values, bint dropna, const uint8_t[:] mask=None):
3737
{{endif}}
3838
cdef:
3939
Py_ssize_t i = 0
@@ -46,6 +46,11 @@ cdef value_count_{{dtype}}(const {{dtype}}_t[:] values, bint dropna):
4646
{{c_type}} val
4747

4848
int ret = 0
49+
bint uses_mask = mask is not None
50+
bint isna_entry = False
51+
52+
if uses_mask and not dropna:
53+
raise NotImplementedError("uses_mask not implemented with dropna=False")
4954

5055
# we track the order in which keys are first seen (GH39009),
5156
# khash-map isn't insertion-ordered, thus:
@@ -56,6 +61,9 @@ cdef value_count_{{dtype}}(const {{dtype}}_t[:] values, bint dropna):
5661
table = kh_init_{{ttype}}()
5762

5863
{{if dtype == 'object'}}
64+
if uses_mask:
65+
raise NotImplementedError("uses_mask not implemented with object dtype")
66+
5967
kh_resize_{{ttype}}(table, n // 10)
6068

6169
for i in range(n):
@@ -74,7 +82,13 @@ cdef value_count_{{dtype}}(const {{dtype}}_t[:] values, bint dropna):
7482
for i in range(n):
7583
val = {{to_c_type}}(values[i])
7684

77-
if not is_nan_{{c_type}}(val) or not dropna:
85+
if dropna:
86+
if uses_mask:
87+
isna_entry = mask[i]
88+
else:
89+
isna_entry = is_nan_{{c_type}}(val)
90+
91+
if not dropna or not isna_entry:
7892
k = kh_get_{{ttype}}(table, val)
7993
if k != table.n_buckets:
8094
table.vals[k] += 1
@@ -251,37 +265,37 @@ ctypedef fused htfunc_t:
251265
complex64_t
252266

253267

254-
cpdef value_count(ndarray[htfunc_t] values, bint dropna):
268+
cpdef value_count(ndarray[htfunc_t] values, bint dropna, const uint8_t[:] mask=None):
255269
if htfunc_t is object:
256-
return value_count_object(values, dropna)
270+
return value_count_object(values, dropna, mask=mask)
257271

258272
elif htfunc_t is int8_t:
259-
return value_count_int8(values, dropna)
273+
return value_count_int8(values, dropna, mask=mask)
260274
elif htfunc_t is int16_t:
261-
return value_count_int16(values, dropna)
275+
return value_count_int16(values, dropna, mask=mask)
262276
elif htfunc_t is int32_t:
263-
return value_count_int32(values, dropna)
277+
return value_count_int32(values, dropna, mask=mask)
264278
elif htfunc_t is int64_t:
265-
return value_count_int64(values, dropna)
279+
return value_count_int64(values, dropna, mask=mask)
266280

267281
elif htfunc_t is uint8_t:
268-
return value_count_uint8(values, dropna)
282+
return value_count_uint8(values, dropna, mask=mask)
269283
elif htfunc_t is uint16_t:
270-
return value_count_uint16(values, dropna)
284+
return value_count_uint16(values, dropna, mask=mask)
271285
elif htfunc_t is uint32_t:
272-
return value_count_uint32(values, dropna)
286+
return value_count_uint32(values, dropna, mask=mask)
273287
elif htfunc_t is uint64_t:
274-
return value_count_uint64(values, dropna)
288+
return value_count_uint64(values, dropna, mask=mask)
275289

276290
elif htfunc_t is float64_t:
277-
return value_count_float64(values, dropna)
291+
return value_count_float64(values, dropna, mask=mask)
278292
elif htfunc_t is float32_t:
279-
return value_count_float32(values, dropna)
293+
return value_count_float32(values, dropna, mask=mask)
280294

281295
elif htfunc_t is complex128_t:
282-
return value_count_complex128(values, dropna)
296+
return value_count_complex128(values, dropna, mask=mask)
283297
elif htfunc_t is complex64_t:
284-
return value_count_complex64(values, dropna)
298+
return value_count_complex64(values, dropna, mask=mask)
285299

286300
else:
287301
raise TypeError(values.dtype)
@@ -361,7 +375,7 @@ cpdef ismember(ndarray[htfunc_t] arr, ndarray[htfunc_t] values):
361375

362376
@cython.wraparound(False)
363377
@cython.boundscheck(False)
364-
def mode(ndarray[htfunc_t] values, bint dropna):
378+
def mode(ndarray[htfunc_t] values, bint dropna, const uint8_t[:] mask=None):
365379
# TODO(cython3): use const htfunct_t[:]
366380

367381
cdef:
@@ -372,7 +386,7 @@ def mode(ndarray[htfunc_t] values, bint dropna):
372386
int64_t count, max_count = -1
373387
Py_ssize_t nkeys, k, j = 0
374388

375-
keys, counts = value_count(values, dropna)
389+
keys, counts = value_count(values, dropna, mask=mask)
376390
nkeys = len(keys)
377391

378392
modes = np.empty(nkeys, dtype=values.dtype)

pandas/_libs/lib.pyx

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -873,7 +873,7 @@ def get_level_sorter(
873873
"""
874874
cdef:
875875
Py_ssize_t i, l, r
876-
ndarray[intp_t, ndim=1] out = np.empty(len(codes), dtype=np.intp)
876+
ndarray[intp_t, ndim=1] out = cnp.PyArray_EMPTY(1, codes.shape, cnp.NPY_INTP, 0)
877877

878878
for i in range(len(starts) - 1):
879879
l, r = starts[i], starts[i + 1]
@@ -2255,11 +2255,11 @@ def maybe_convert_numeric(
22552255
int status, maybe_int
22562256
Py_ssize_t i, n = values.size
22572257
Seen seen = Seen(coerce_numeric)
2258-
ndarray[float64_t, ndim=1] floats = np.empty(n, dtype='f8')
2259-
ndarray[complex128_t, ndim=1] complexes = np.empty(n, dtype='c16')
2260-
ndarray[int64_t, ndim=1] ints = np.empty(n, dtype='i8')
2261-
ndarray[uint64_t, ndim=1] uints = np.empty(n, dtype='u8')
2262-
ndarray[uint8_t, ndim=1] bools = np.empty(n, dtype='u1')
2258+
ndarray[float64_t, ndim=1] floats = cnp.PyArray_EMPTY(1, values.shape, cnp.NPY_FLOAT64, 0)
2259+
ndarray[complex128_t, ndim=1] complexes = cnp.PyArray_EMPTY(1, values.shape, cnp.NPY_COMPLEX128, 0)
2260+
ndarray[int64_t, ndim=1] ints = cnp.PyArray_EMPTY(1, values.shape, cnp.NPY_INT64, 0)
2261+
ndarray[uint64_t, ndim=1] uints = cnp.PyArray_EMPTY(1, values.shape, cnp.NPY_UINT64, 0)
2262+
ndarray[uint8_t, ndim=1] bools = cnp.PyArray_EMPTY(1, values.shape, cnp.NPY_UINT8, 0)
22632263
ndarray[uint8_t, ndim=1] mask = np.zeros(n, dtype="u1")
22642264
float64_t fval
22652265
bint allow_null_in_int = convert_to_masked_nullable
@@ -2479,11 +2479,11 @@ def maybe_convert_objects(ndarray[object] objects,
24792479

24802480
n = len(objects)
24812481

2482-
floats = np.empty(n, dtype='f8')
2483-
complexes = np.empty(n, dtype='c16')
2484-
ints = np.empty(n, dtype='i8')
2485-
uints = np.empty(n, dtype='u8')
2486-
bools = np.empty(n, dtype=np.uint8)
2482+
floats = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_FLOAT64, 0)
2483+
complexes = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_COMPLEX128, 0)
2484+
ints = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_INT64, 0)
2485+
uints = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_UINT64, 0)
2486+
bools = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_UINT8, 0)
24872487
mask = np.full(n, False)
24882488

24892489
if convert_datetime:
@@ -2785,7 +2785,7 @@ cdef _infer_all_nats(dtype, ndarray datetimes, ndarray timedeltas):
27852785
else:
27862786
# ExtensionDtype
27872787
cls = dtype.construct_array_type()
2788-
i8vals = np.empty(len(datetimes), dtype="i8")
2788+
i8vals = cnp.PyArray_EMPTY(1, datetimes.shape, cnp.NPY_INT64, 0)
27892789
i8vals.fill(NPY_NAT)
27902790
result = cls(i8vals, dtype=dtype)
27912791
return result
@@ -2888,7 +2888,7 @@ def map_infer(
28882888
object val
28892889

28902890
n = len(arr)
2891-
result = np.empty(n, dtype=object)
2891+
result = cnp.PyArray_EMPTY(1, arr.shape, cnp.NPY_OBJECT, 0)
28922892
for i in range(n):
28932893
if ignore_na and checknull(arr[i]):
28942894
result[i] = arr[i]
@@ -3083,7 +3083,7 @@ cpdef ndarray eq_NA_compat(ndarray[object] arr, object key):
30833083
key is assumed to have `not isna(key)`
30843084
"""
30853085
cdef:
3086-
ndarray[uint8_t, cast=True] result = np.empty(len(arr), dtype=bool)
3086+
ndarray[uint8_t, cast=True] result = cnp.PyArray_EMPTY(arr.ndim, arr.shape, cnp.NPY_BOOL, 0)
30873087
Py_ssize_t i
30883088
object item
30893089

0 commit comments

Comments
 (0)