Skip to content

Commit 5bad13f

Browse files
authored
Merge pull request #157 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 91209d8 + d7b04d1 commit 5bad13f

27 files changed

+1099
-499
lines changed

doc/cheatsheet/Pandas_Cheat_Sheet.pdf

9.56 KB
Binary file not shown.
9.14 KB
Binary file not shown.

doc/source/ecosystem.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ which can be used for a wide variety of time series data mining tasks.
9898
Visualization
9999
-------------
100100

101-
While :ref:`pandas has built-in support for data visualization with matplotlib <visualization>`,
101+
`Pandas has its own Styler class for table visualization <user_guide/style.ipynb>`_, and while
102+
:ref:`pandas also has built-in support for data visualization through charts with matplotlib <visualization>`,
102103
there are a number of other pandas-compatible libraries.
103104

104105
`Altair <https://altair-viz.github.io/>`__

doc/source/user_guide/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ Further information on any specific method can be obtained in the
3838
integer_na
3939
boolean
4040
visualization
41+
style
4142
computation
4243
groupby
4344
window
4445
timeseries
4546
timedeltas
46-
style
4747
options
4848
enhancingperf
4949
scale

doc/source/user_guide/style.ipynb

+785-407
Large diffs are not rendered by default.

doc/source/user_guide/visualization.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,12 @@
22

33
{{ header }}
44

5-
*************
6-
Visualization
7-
*************
5+
*******************
6+
Chart Visualization
7+
*******************
8+
9+
This section demonstrates visualization through charting. For information on
10+
visualization of tabular data please see the section on `Table Visualization <style.ipynb>`_.
811

912
We use the standard convention for referencing the matplotlib API:
1013

doc/source/user_guide/window.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ be calculated with :meth:`~Rolling.apply` by specifying a separate column of wei
101101
102102
All windowing operations support a ``min_periods`` argument that dictates the minimum amount of
103103
non-``np.nan`` values a window must have; otherwise, the resulting value is ``np.nan``.
104-
``min_peridos`` defaults to 1 for time-based windows and ``window`` for fixed windows
104+
``min_periods`` defaults to 1 for time-based windows and ``window`` for fixed windows
105105

106106
.. ipython:: python
107107

doc/source/whatsnew/v1.3.0.rst

+33
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,38 @@ cast to ``dtype=object`` (:issue:`38709`)
302302
ser2
303303
304304
305+
.. _whatsnew_130.notable_bug_fixes.rolling_groupby_column:
306+
307+
GroupBy.rolling no longer returns grouped-by column in values
308+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
309+
310+
The group-by column will now be dropped from the result of a
311+
``groupby.rolling`` operation (:issue:`32262`)
312+
313+
.. ipython:: python
314+
315+
df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
316+
df
317+
318+
*Previous behavior*:
319+
320+
.. code-block:: ipython
321+
322+
In [1]: df.groupby("A").rolling(2).sum()
323+
Out[1]:
324+
A B
325+
A
326+
1 0 NaN NaN
327+
1 2.0 1.0
328+
2 2 NaN NaN
329+
3 3 NaN NaN
330+
331+
*New behavior*:
332+
333+
.. ipython:: python
334+
335+
df.groupby("A").rolling(2).sum()
336+
305337
.. _whatsnew_130.notable_bug_fixes.rolling_var_precision:
306338

307339
Removed artificial truncation in rolling variance and standard deviation
@@ -501,6 +533,7 @@ Numeric
501533
- Bug in :meth:`DataFrame.mode` and :meth:`Series.mode` not keeping consistent integer :class:`Index` for empty input (:issue:`33321`)
502534
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
503535
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError`` (:issue:`38932`)
536+
- Bug in ``rank`` method for :class:`Series`, :class:`DataFrame`, :class:`DataFrameGroupBy`, and :class:`SeriesGroupBy` treating the most negative ``int64`` value as missing (:issue:`32859`)
504537
- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
505538
- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
506539
- Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)

pandas/_libs/algos.pyx

+19-7
Original file line numberDiff line numberDiff line change
@@ -962,6 +962,7 @@ ctypedef fused rank_t:
962962
def rank_1d(
963963
ndarray[rank_t, ndim=1] values,
964964
const intp_t[:] labels,
965+
bint is_datetimelike=False,
965966
ties_method="average",
966967
bint ascending=True,
967968
bint pct=False,
@@ -977,6 +978,8 @@ def rank_1d(
977978
Array containing unique label for each group, with its ordering
978979
matching up to the corresponding record in `values`. If not called
979980
from a groupby operation, will be an array of 0's
981+
is_datetimelike : bool, default False
982+
True if `values` contains datetime-like entries.
980983
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default
981984
'average'
982985
* average: average rank of group
@@ -1032,7 +1035,7 @@ def rank_1d(
10321035

10331036
if rank_t is object:
10341037
mask = missing.isnaobj(masked_vals)
1035-
elif rank_t is int64_t:
1038+
elif rank_t is int64_t and is_datetimelike:
10361039
mask = (masked_vals == NPY_NAT).astype(np.uint8)
10371040
elif rank_t is float64_t:
10381041
mask = np.isnan(masked_vals).astype(np.uint8)
@@ -1059,7 +1062,7 @@ def rank_1d(
10591062
if rank_t is object:
10601063
nan_fill_val = NegInfinity()
10611064
elif rank_t is int64_t:
1062-
nan_fill_val = np.iinfo(np.int64).min
1065+
nan_fill_val = NPY_NAT
10631066
elif rank_t is uint64_t:
10641067
nan_fill_val = 0
10651068
else:
@@ -1275,6 +1278,7 @@ def rank_1d(
12751278
def rank_2d(
12761279
ndarray[rank_t, ndim=2] in_arr,
12771280
int axis=0,
1281+
bint is_datetimelike=False,
12781282
ties_method="average",
12791283
bint ascending=True,
12801284
na_option="keep",
@@ -1299,7 +1303,9 @@ def rank_2d(
12991303
tiebreak = tiebreakers[ties_method]
13001304

13011305
keep_na = na_option == 'keep'
1302-
check_mask = rank_t is not uint64_t
1306+
1307+
# For cases where a mask is not possible, we can avoid mask checks
1308+
check_mask = not (rank_t is uint64_t or (rank_t is int64_t and not is_datetimelike))
13031309

13041310
if axis == 0:
13051311
values = np.asarray(in_arr).T.copy()
@@ -1310,28 +1316,34 @@ def rank_2d(
13101316
if values.dtype != np.object_:
13111317
values = values.astype('O')
13121318

1313-
if rank_t is not uint64_t:
1319+
if check_mask:
13141320
if ascending ^ (na_option == 'top'):
13151321
if rank_t is object:
13161322
nan_value = Infinity()
13171323
elif rank_t is float64_t:
13181324
nan_value = np.inf
1319-
elif rank_t is int64_t:
1325+
1326+
# int64 and datetimelike
1327+
else:
13201328
nan_value = np.iinfo(np.int64).max
13211329

13221330
else:
13231331
if rank_t is object:
13241332
nan_value = NegInfinity()
13251333
elif rank_t is float64_t:
13261334
nan_value = -np.inf
1327-
elif rank_t is int64_t:
1335+
1336+
# int64 and datetimelike
1337+
else:
13281338
nan_value = NPY_NAT
13291339

13301340
if rank_t is object:
13311341
mask = missing.isnaobj2d(values)
13321342
elif rank_t is float64_t:
13331343
mask = np.isnan(values)
1334-
elif rank_t is int64_t:
1344+
1345+
# int64 and datetimelike
1346+
else:
13351347
mask = values == NPY_NAT
13361348

13371349
np.putmask(values, mask, nan_value)

pandas/_libs/groupby.pyx

+9-14
Original file line numberDiff line numberDiff line change
@@ -681,18 +681,17 @@ group_mean_float64 = _group_mean['double']
681681

682682
@cython.wraparound(False)
683683
@cython.boundscheck(False)
684-
def _group_ohlc(floating[:, ::1] out,
685-
int64_t[::1] counts,
686-
ndarray[floating, ndim=2] values,
687-
const intp_t[:] labels,
688-
Py_ssize_t min_count=-1):
684+
def group_ohlc(floating[:, ::1] out,
685+
int64_t[::1] counts,
686+
ndarray[floating, ndim=2] values,
687+
const intp_t[:] labels,
688+
Py_ssize_t min_count=-1):
689689
"""
690690
Only aggregates on axis=0
691691
"""
692692
cdef:
693693
Py_ssize_t i, j, N, K, lab
694-
floating val, count
695-
Py_ssize_t ngroups = len(counts)
694+
floating val
696695

697696
assert min_count == -1, "'min_count' only used in add and prod"
698697

@@ -727,10 +726,6 @@ def _group_ohlc(floating[:, ::1] out,
727726
out[lab, 3] = val
728727

729728

730-
group_ohlc_float32 = _group_ohlc['float']
731-
group_ohlc_float64 = _group_ohlc['double']
732-
733-
734729
@cython.boundscheck(False)
735730
@cython.wraparound(False)
736731
def group_quantile(ndarray[float64_t] out,
@@ -1079,9 +1074,8 @@ def group_rank(float64_t[:, ::1] out,
10791074
ngroups : int
10801075
This parameter is not used, is needed to match signatures of other
10811076
groupby functions.
1082-
is_datetimelike : bool, default False
1083-
unused in this method but provided for call compatibility with other
1084-
Cython transformations
1077+
is_datetimelike : bool
1078+
True if `values` contains datetime-like entries.
10851079
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default
10861080
'average'
10871081
* average: average rank of group
@@ -1109,6 +1103,7 @@ def group_rank(float64_t[:, ::1] out,
11091103
result = rank_1d(
11101104
values=values[:, 0],
11111105
labels=labels,
1106+
is_datetimelike=is_datetimelike,
11121107
ties_method=ties_method,
11131108
ascending=ascending,
11141109
pct=pct,

pandas/_libs/internals.pyi

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
from typing import (
2+
Iterator,
3+
Sequence,
4+
overload,
5+
)
6+
7+
import numpy as np
8+
9+
from pandas._typing import ArrayLike
10+
11+
def slice_len(slc: slice, objlen: int = ...) -> int: ...
12+
13+
14+
def get_blkno_indexers(
15+
blknos: np.ndarray, # int64_t[:]
16+
group: bool = ...,
17+
) -> list[tuple[int, slice | np.ndarray]]: ...
18+
19+
20+
def get_blkno_placements(
21+
blknos: np.ndarray,
22+
group: bool = ...,
23+
) -> Iterator[tuple[int, BlockPlacement]]: ...
24+
25+
26+
class BlockPlacement:
27+
def __init__(self, val: int | slice | np.ndarray): ...
28+
29+
@property
30+
def indexer(self) -> np.ndarray | slice: ...
31+
32+
@property
33+
def as_array(self) -> np.ndarray: ...
34+
35+
@property
36+
def is_slice_like(self) -> bool: ...
37+
38+
@overload
39+
def __getitem__(self, loc: slice | Sequence[int]) -> BlockPlacement: ...
40+
41+
@overload
42+
def __getitem__(self, loc: int) -> int: ...
43+
44+
def __iter__(self) -> Iterator[int]: ...
45+
46+
def __len__(self) -> int: ...
47+
48+
def delete(self, loc) -> BlockPlacement: ...
49+
50+
def append(self, others: list[BlockPlacement]) -> BlockPlacement: ...
51+
52+
53+
class Block:
54+
_mgr_locs: BlockPlacement
55+
ndim: int
56+
values: ArrayLike
57+
58+
def __init__(self, values: ArrayLike, placement: BlockPlacement, ndim: int): ...

pandas/_libs/testing.pyi

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
3+
def assert_dict_equal(a, b, compare_keys: bool = ...): ...
4+
5+
def assert_almost_equal(a, b,
6+
rtol: float = ..., atol: float = ...,
7+
check_dtype: bool = ...,
8+
obj=..., lobj=..., robj=..., index_values=...): ...

pandas/_testing/asserters.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,9 @@ def assert_almost_equal(
154154
else:
155155
obj = "Input"
156156
assert_class_equal(left, right, obj=obj)
157+
158+
# if we have "equiv", this becomes True
159+
check_dtype = bool(check_dtype)
157160
_testing.assert_almost_equal(
158161
left, right, check_dtype=check_dtype, rtol=rtol, atol=atol, **kwargs
159162
)
@@ -388,12 +391,15 @@ def _get_ilevel_values(index, level):
388391
msg = f"{obj} values are different ({np.round(diff, 5)} %)"
389392
raise_assert_detail(obj, msg, left, right)
390393
else:
394+
395+
# if we have "equiv", this becomes True
396+
exact_bool = bool(exact)
391397
_testing.assert_almost_equal(
392398
left.values,
393399
right.values,
394400
rtol=rtol,
395401
atol=atol,
396-
check_dtype=exact,
402+
check_dtype=exact_bool,
397403
obj=obj,
398404
lobj=left,
399405
robj=right,

pandas/core/algorithms.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -1031,21 +1031,23 @@ def rank(
10311031
Whether or not to the display the returned rankings in integer form
10321032
(e.g. 1, 2, 3) or in percentile form (e.g. 0.333..., 0.666..., 1).
10331033
"""
1034+
is_datetimelike = needs_i8_conversion(values.dtype)
1035+
values = _get_values_for_rank(values)
10341036
if values.ndim == 1:
1035-
values = _get_values_for_rank(values)
10361037
ranks = algos.rank_1d(
10371038
values,
10381039
labels=np.zeros(len(values), dtype=np.intp),
1040+
is_datetimelike=is_datetimelike,
10391041
ties_method=method,
10401042
ascending=ascending,
10411043
na_option=na_option,
10421044
pct=pct,
10431045
)
10441046
elif values.ndim == 2:
1045-
values = _get_values_for_rank(values)
10461047
ranks = algos.rank_2d(
10471048
values,
10481049
axis=axis,
1050+
is_datetimelike=is_datetimelike,
10491051
ties_method=method,
10501052
ascending=ascending,
10511053
na_option=na_option,

pandas/core/frame.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -528,7 +528,7 @@ class DataFrame(NDFrame, OpsMixin):
528528
>>> from dataclasses import make_dataclass
529529
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
530530
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
531-
x y
531+
x y
532532
0 0 0
533533
1 0 3
534534
2 2 3

pandas/core/groupby/ops.py

+6
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,12 @@ def _get_cython_func_and_vals(
486486
func = _get_cython_function(kind, how, values.dtype, is_numeric)
487487
else:
488488
raise
489+
else:
490+
if values.dtype.kind in ["i", "u"]:
491+
if how in ["ohlc"]:
492+
# The output may still include nans, so we have to cast
493+
values = ensure_float64(values)
494+
489495
return func, values
490496

491497
@final

0 commit comments

Comments
 (0)