Skip to content

Commit 083efb7

Browse files
authored
Merge pull request #243 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 781647a + ccb3365 commit 083efb7

File tree

31 files changed

+794
-223
lines changed

31 files changed

+794
-223
lines changed

ci/code_checks.sh

+1-2
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
121121
pandas/io/parsers/ \
122122
pandas/io/sas/ \
123123
pandas/io/sql.py \
124-
pandas/tseries/ \
125-
pandas/io/formats/style_render.py
124+
pandas/tseries/
126125
RET=$(($RET + $?)) ; echo $MSG "DONE"
127126

128127
fi

doc/source/user_guide/visualization.rst

+54
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,34 @@ The ``by`` keyword can be specified to plot grouped histograms:
316316
@savefig grouped_hist.png
317317
data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4));
318318
319+
.. ipython:: python
320+
:suppress:
321+
322+
plt.close("all")
323+
np.random.seed(123456)
324+
325+
In addition, the ``by`` keyword can also be specified in :meth:`DataFrame.plot.hist`.
326+
327+
.. versionchanged:: 1.4.0
328+
329+
.. ipython:: python
330+
331+
data = pd.DataFrame(
332+
{
333+
"a": np.random.choice(["x", "y", "z"], 1000),
334+
"b": np.random.choice(["e", "f", "g"], 1000),
335+
"c": np.random.randn(1000),
336+
"d": np.random.randn(1000) - 1,
337+
},
338+
)
339+
340+
@savefig grouped_hist_by.png
341+
data.plot.hist(by=["a", "b"], figsize=(10, 5));
342+
343+
.. ipython:: python
344+
:suppress:
345+
346+
plt.close("all")
319347
320348
.. _visualization.box:
321349

@@ -448,6 +476,32 @@ columns:
448476
449477
plt.close("all")
450478
479+
You could also create groupings with :meth:`DataFrame.plot.box`, for instance:
480+
481+
.. versionchanged:: 1.4.0
482+
483+
.. ipython:: python
484+
:suppress:
485+
486+
plt.close("all")
487+
np.random.seed(123456)
488+
489+
.. ipython:: python
490+
:okwarning:
491+
492+
df = pd.DataFrame(np.random.rand(10, 3), columns=["Col1", "Col2", "Col3"])
493+
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
494+
495+
plt.figure();
496+
497+
@savefig box_plot_ex4.png
498+
bp = df.plot.box(column=["Col1", "Col2"], by="X")
499+
500+
.. ipython:: python
501+
:suppress:
502+
503+
plt.close("all")
504+
451505
.. _visualization.box.return:
452506

453507
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both", None}``.

doc/source/whatsnew/v1.3.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717
- Performance regression in :meth:`DataFrame.isin` and :meth:`Series.isin` for nullable data types (:issue:`42714`)
1818
- Regression in updating values of :class:`pandas.Series` using boolean index, created by using :meth:`pandas.DataFrame.pop` (:issue:`42530`)
1919
- Regression in :meth:`DataFrame.from_records` with empty records (:issue:`42456`)
20+
- Fixed regression in :meth:`DataFrame.shift` where TypeError occurred when shifting DataFrame created by concatenation of slices and fills with values (:issue:`42719`)
2021
-
2122

2223
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.4.0.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Other enhancements
3535
- Additional options added to :meth:`.Styler.bar` to control alignment and display, with keyword only arguments (:issue:`26070`, :issue:`36419`)
3636
- :meth:`Styler.bar` now validates the input argument ``width`` and ``height`` (:issue:`42511`)
3737
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
38+
- Added keyword argument ``environment`` to :meth:`.Styler.to_latex` also allowing a specific "longtable" entry with a separate jinja2 template (:issue:`41866`)
3839
-
3940

4041
.. ---------------------------------------------------------------------------
@@ -203,7 +204,7 @@ Numeric
203204
^^^^^^^
204205
- Bug in :meth:`DataFrame.rank` raising ``ValueError`` with ``object`` columns and ``method="first"`` (:issue:`41931`)
205206
- Bug in :meth:`DataFrame.rank` treating missing values and extreme values as equal (for example ``np.nan`` and ``np.inf``), causing incorrect results when ``na_option="bottom"`` or ``na_option="top`` used (:issue:`41931`)
206-
-
207+
- Bug in ``numexpr`` engine still being used when the option ``compute.use_numexpr`` is set to ``False`` (:issue:`32556`)
207208

208209
Conversion
209210
^^^^^^^^^^
@@ -260,11 +261,12 @@ Groupby/resample/rolling
260261
^^^^^^^^^^^^^^^^^^^^^^^^
261262
- Fixed bug in :meth:`SeriesGroupBy.apply` where passing an unrecognized string argument failed to raise ``TypeError`` when the underlying ``Series`` is empty (:issue:`42021`)
262263
- Bug in :meth:`Series.rolling.apply`, :meth:`DataFrame.rolling.apply`, :meth:`Series.expanding.apply` and :meth:`DataFrame.expanding.apply` with ``engine="numba"`` where ``*args`` were being cached with the user passed function (:issue:`42287`)
263-
-
264+
- Bug in :meth:`DataFrame.groupby.rolling.var` would calculate the rolling variance only on the first group (:issue:`42442`)
264265

265266
Reshaping
266267
^^^^^^^^^
267268
- :func:`concat` creating :class:`MultiIndex` with duplicate level entries when concatenating a :class:`DataFrame` with duplicates in :class:`Index` and multiple keys (:issue:`42651`)
269+
- Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`)
268270
-
269271

270272
Sparse
@@ -284,6 +286,7 @@ Styler
284286

285287
Other
286288
^^^^^
289+
- Bug in :meth:`CustomBusinessMonthBegin.__add__` (:meth:`CustomBusinessMonthEnd.__add__`) not applying the extra ``offset`` parameter when beginning (end) of the target month is already a business day (:issue:`41356`)
287290

288291
.. ***DO NOT USE THIS SECTION***
289292

pandas/_libs/algos.pyx

+13-14
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,8 @@ def groupsort_indexer(const intp_t[:] index, Py_ssize_t ngroups):
217217
This is a reverse of the label factorization process.
218218
"""
219219
cdef:
220-
Py_ssize_t i, loc, label, n
221-
ndarray[intp_t] indexer, where, counts
220+
Py_ssize_t i, label, n
221+
intp_t[::1] indexer, where, counts
222222

223223
counts = np.zeros(ngroups + 1, dtype=np.intp)
224224
n = len(index)
@@ -241,7 +241,7 @@ def groupsort_indexer(const intp_t[:] index, Py_ssize_t ngroups):
241241
indexer[where[label]] = i
242242
where[label] += 1
243243

244-
return indexer, counts
244+
return indexer.base, counts.base
245245

246246

247247
cdef inline Py_ssize_t swap(numeric *a, numeric *b) nogil:
@@ -325,11 +325,10 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
325325
cdef:
326326
Py_ssize_t i, j, xi, yi, N, K
327327
bint minpv
328-
ndarray[float64_t, ndim=2] result
328+
float64_t[:, ::1] result
329329
ndarray[uint8_t, ndim=2] mask
330330
int64_t nobs = 0
331-
float64_t vx, vy, meanx, meany, divisor, prev_meany, prev_meanx, ssqdmx
332-
float64_t ssqdmy, covxy
331+
float64_t vx, vy, dx, dy, meanx, meany, divisor, ssqdmx, ssqdmy, covxy
333332

334333
N, K = (<object>mat).shape
335334

@@ -352,13 +351,13 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
352351
vx = mat[i, xi]
353352
vy = mat[i, yi]
354353
nobs += 1
355-
prev_meanx = meanx
356-
prev_meany = meany
357-
meanx = meanx + 1 / nobs * (vx - meanx)
358-
meany = meany + 1 / nobs * (vy - meany)
359-
ssqdmx = ssqdmx + (vx - meanx) * (vx - prev_meanx)
360-
ssqdmy = ssqdmy + (vy - meany) * (vy - prev_meany)
361-
covxy = covxy + (vx - meanx) * (vy - prev_meany)
354+
dx = vx - meanx
355+
dy = vy - meany
356+
meanx += 1 / nobs * dx
357+
meany += 1 / nobs * dy
358+
ssqdmx += (vx - meanx) * dx
359+
ssqdmy += (vy - meany) * dy
360+
covxy += (vx - meanx) * dy
362361

363362
if nobs < minpv:
364363
result[xi, yi] = result[yi, xi] = NaN
@@ -370,7 +369,7 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
370369
else:
371370
result[xi, yi] = result[yi, xi] = NaN
372371

373-
return result
372+
return result.base
374373

375374
# ----------------------------------------------------------------------
376375
# Pairwise Spearman correlation

pandas/_libs/tslibs/offsets.pyx

+7-1
Original file line numberDiff line numberDiff line change
@@ -3370,7 +3370,10 @@ cdef class _CustomBusinessMonth(BusinessMixin):
33703370
"""
33713371
Define default roll function to be called in apply method.
33723372
"""
3373-
cbday = CustomBusinessDay(n=self.n, normalize=False, **self.kwds)
3373+
cbday_kwds = self.kwds.copy()
3374+
cbday_kwds['offset'] = timedelta(0)
3375+
3376+
cbday = CustomBusinessDay(n=1, normalize=False, **cbday_kwds)
33743377

33753378
if self._prefix.endswith("S"):
33763379
# MonthBegin
@@ -3414,6 +3417,9 @@ cdef class _CustomBusinessMonth(BusinessMixin):
34143417

34153418
new = cur_month_offset_date + n * self.m_offset
34163419
result = self.cbday_roll(new)
3420+
3421+
if self.offset:
3422+
result = result + self.offset
34173423
return result
34183424

34193425

pandas/_libs/window/aggregations.pyx

+4-1
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,10 @@ cdef inline void add_var(float64_t val, float64_t *nobs, float64_t *mean_x,
310310
t = y - mean_x[0]
311311
compensation[0] = t + mean_x[0] - y
312312
delta = t
313-
mean_x[0] = mean_x[0] + delta / nobs[0]
313+
if nobs[0]:
314+
mean_x[0] = mean_x[0] + delta / nobs[0]
315+
else:
316+
mean_x[0] = 0
314317
ssqdm_x[0] = ssqdm_x[0] + (val - prev_mean) * (val - mean_x[0])
315318

316319

pandas/core/computation/eval.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,10 @@ def _check_engine(engine: str | None) -> str:
4343
Engine name.
4444
"""
4545
from pandas.core.computation.check import NUMEXPR_INSTALLED
46+
from pandas.core.computation.expressions import USE_NUMEXPR
4647

4748
if engine is None:
48-
engine = "numexpr" if NUMEXPR_INSTALLED else "python"
49+
engine = "numexpr" if USE_NUMEXPR else "python"
4950

5051
if engine not in ENGINES:
5152
valid_engines = list(ENGINES.keys())

pandas/core/groupby/groupby.py

+44-39
Original file line numberDiff line numberDiff line change
@@ -2897,16 +2897,15 @@ def _get_cythonized_result(
28972897

28982898
ids, _, ngroups = grouper.group_info
28992899
output: dict[base.OutputKey, np.ndarray] = {}
2900-
base_func = getattr(libgroupby, how)
2901-
2902-
error_msg = ""
2903-
for idx, obj in enumerate(self._iterate_slices()):
2904-
name = obj.name
2905-
values = obj._values
29062900

2907-
if numeric_only and not is_numeric_dtype(values.dtype):
2908-
continue
2901+
base_func = getattr(libgroupby, how)
2902+
base_func = partial(base_func, labels=ids)
2903+
if needs_ngroups:
2904+
base_func = partial(base_func, ngroups=ngroups)
2905+
if min_count is not None:
2906+
base_func = partial(base_func, min_count=min_count)
29092907

2908+
def blk_func(values: ArrayLike) -> ArrayLike:
29102909
if aggregate:
29112910
result_sz = ngroups
29122911
else:
@@ -2915,54 +2914,31 @@ def _get_cythonized_result(
29152914
result = np.zeros(result_sz, dtype=cython_dtype)
29162915
if needs_2d:
29172916
result = result.reshape((-1, 1))
2918-
func = partial(base_func, result)
2917+
func = partial(base_func, out=result)
29192918

29202919
inferences = None
29212920

29222921
if needs_counts:
29232922
counts = np.zeros(self.ngroups, dtype=np.int64)
2924-
func = partial(func, counts)
2923+
func = partial(func, counts=counts)
29252924

29262925
if needs_values:
29272926
vals = values
29282927
if pre_processing:
2929-
try:
2930-
vals, inferences = pre_processing(vals)
2931-
except TypeError as err:
2932-
error_msg = str(err)
2933-
howstr = how.replace("group_", "")
2934-
warnings.warn(
2935-
"Dropping invalid columns in "
2936-
f"{type(self).__name__}.{howstr} is deprecated. "
2937-
"In a future version, a TypeError will be raised. "
2938-
f"Before calling .{howstr}, select only columns which "
2939-
"should be valid for the function.",
2940-
FutureWarning,
2941-
stacklevel=3,
2942-
)
2943-
continue
2928+
vals, inferences = pre_processing(vals)
2929+
29442930
vals = vals.astype(cython_dtype, copy=False)
29452931
if needs_2d:
29462932
vals = vals.reshape((-1, 1))
2947-
func = partial(func, vals)
2948-
2949-
func = partial(func, ids)
2950-
2951-
if min_count is not None:
2952-
func = partial(func, min_count)
2933+
func = partial(func, values=vals)
29532934

29542935
if needs_mask:
29552936
mask = isna(values).view(np.uint8)
2956-
func = partial(func, mask)
2957-
2958-
if needs_ngroups:
2959-
func = partial(func, ngroups)
2937+
func = partial(func, mask=mask)
29602938

29612939
if needs_nullable:
29622940
is_nullable = isinstance(values, BaseMaskedArray)
29632941
func = partial(func, nullable=is_nullable)
2964-
if post_processing:
2965-
post_processing = partial(post_processing, nullable=is_nullable)
29662942

29672943
func(**kwargs) # Call func to modify indexer values in place
29682944

@@ -2973,9 +2949,38 @@ def _get_cythonized_result(
29732949
result = algorithms.take_nd(values, result)
29742950

29752951
if post_processing:
2976-
result = post_processing(result, inferences)
2952+
pp_kwargs = {}
2953+
if needs_nullable:
2954+
pp_kwargs["nullable"] = isinstance(values, BaseMaskedArray)
29772955

2978-
key = base.OutputKey(label=name, position=idx)
2956+
result = post_processing(result, inferences, **pp_kwargs)
2957+
2958+
return result
2959+
2960+
error_msg = ""
2961+
for idx, obj in enumerate(self._iterate_slices()):
2962+
values = obj._values
2963+
2964+
if numeric_only and not is_numeric_dtype(values.dtype):
2965+
continue
2966+
2967+
try:
2968+
result = blk_func(values)
2969+
except TypeError as err:
2970+
error_msg = str(err)
2971+
howstr = how.replace("group_", "")
2972+
warnings.warn(
2973+
"Dropping invalid columns in "
2974+
f"{type(self).__name__}.{howstr} is deprecated. "
2975+
"In a future version, a TypeError will be raised. "
2976+
f"Before calling .{howstr}, select only columns which "
2977+
"should be valid for the function.",
2978+
FutureWarning,
2979+
stacklevel=3,
2980+
)
2981+
continue
2982+
2983+
key = base.OutputKey(label=obj.name, position=idx)
29792984
output[key] = result
29802985

29812986
# error_msg is "" on an frame/series with no rows or columns

0 commit comments

Comments
 (0)