Skip to content

Commit c2d61c3

Browse files
Merging in updated master in order to make CI checks pass
2 parents a36d450 + 6ca8757 commit c2d61c3

File tree

27 files changed

+491
-124
lines changed

27 files changed

+491
-124
lines changed

.github/workflows/sdist.yml

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: sdist
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
pull_request:
8+
branches:
9+
- master
10+
- 1.2.x
11+
- 1.3.x
12+
paths-ignore:
13+
- "doc/**"
14+
15+
jobs:
16+
build:
17+
runs-on: ubuntu-latest
18+
timeout-minutes: 60
19+
defaults:
20+
run:
21+
shell: bash -l {0}
22+
23+
strategy:
24+
fail-fast: false
25+
matrix:
26+
python-version: ["3.7", "3.8", "3.9"]
27+
28+
steps:
29+
- uses: actions/checkout@v2
30+
with:
31+
fetch-depth: 0
32+
33+
- name: Set up Python
34+
uses: actions/setup-python@v2
35+
with:
36+
python-version: ${{ matrix.python-version }}
37+
38+
- name: Install dependencies
39+
run: |
40+
python -m pip install --upgrade pip setuptools wheel
41+
42+
# GH 39416
43+
pip install numpy
44+
45+
- name: Build pandas sdist
46+
run: |
47+
pip list
48+
python setup.py sdist --formats=gztar
49+
50+
- uses: conda-incubator/setup-miniconda@v2
51+
with:
52+
activate-environment: pandas-sdist
53+
python-version: ${{ matrix.python-version }}
54+
55+
- name: Install pandas from sdist
56+
run: |
57+
conda list
58+
python -m pip install dist/*.gz
59+
60+
- name: Import pandas
61+
run: |
62+
cd ..
63+
conda list
64+
python -c "import pandas; pandas.show_versions();"

asv_bench/benchmarks/algos/isin.py

+10
Original file line numberDiff line numberDiff line change
@@ -325,3 +325,13 @@ def setup(self, dtype, series_type):
325325

326326
def time_isin(self, dtypes, series_type):
327327
self.series.isin(self.values)
328+
329+
330+
class IsInWithLongTupples:
331+
def setup(self):
332+
t = tuple(range(1000))
333+
self.series = Series([t] * 1000)
334+
self.values = [t]
335+
336+
def time_isin(self):
337+
self.series.isin(self.values)

doc/source/user_guide/indexing.rst

+4-5
Original file line numberDiff line numberDiff line change
@@ -1523,18 +1523,17 @@ Looking up values by index/column labels
15231523
----------------------------------------
15241524

15251525
Sometimes you want to extract a set of values given a sequence of row labels
1526-
and column labels, this can be achieved by ``DataFrame.melt`` combined by filtering the corresponding
1527-
rows with ``DataFrame.loc``. For instance:
1526+
and column labels, this can be achieved by ``pandas.factorize`` and NumPy indexing.
1527+
For instance:
15281528

15291529
.. ipython:: python
15301530
15311531
df = pd.DataFrame({'col': ["A", "A", "B", "B"],
15321532
'A': [80, 23, np.nan, 22],
15331533
'B': [80, 55, 76, 67]})
15341534
df
1535-
melt = df.melt('col')
1536-
melt = melt.loc[melt['col'] == melt['variable'], 'value']
1537-
melt.reset_index(drop=True)
1535+
idx, cols = pd.factorize(df['col'])
1536+
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
15381537
15391538
Formerly this could be achieved with the dedicated ``DataFrame.lookup`` method
15401539
which was deprecated in version 1.2.0.

doc/source/whatsnew/v1.2.5.rst

+7-27
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_125:
22

3-
What's new in 1.2.5 (May ??, 2021)
4-
----------------------------------
3+
What's new in 1.2.5 (June 22, 2021)
4+
-----------------------------------
55

66
These are the changes in pandas 1.2.5. See :ref:`release` for a full changelog
77
including other versions of pandas.
@@ -14,32 +14,12 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17-
- Regression in :func:`concat` between two :class:`DataFrames` where one has an :class:`Index` that is all-None and the other is :class:`DatetimeIndex` incorrectly raising (:issue:`40841`)
17+
- Fixed regression in :func:`concat` between two :class:`DataFrame` where one has an :class:`Index` that is all-None and the other is :class:`DatetimeIndex` incorrectly raising (:issue:`40841`)
1818
- Fixed regression in :meth:`DataFrame.sum` and :meth:`DataFrame.prod` when ``min_count`` and ``numeric_only`` are both given (:issue:`41074`)
19-
- Regression in :func:`read_csv` when using ``memory_map=True`` with an non-UTF8 encoding (:issue:`40986`)
20-
- Regression in :meth:`DataFrame.replace` and :meth:`Series.replace` when the values to replace is a NumPy float array (:issue:`40371`)
21-
- Regression in :func:`ExcelFile` when a corrupt file is opened but not closed (:issue:`41778`)
22-
23-
.. ---------------------------------------------------------------------------
24-
25-
26-
.. _whatsnew_125.bug_fixes:
27-
28-
Bug fixes
29-
~~~~~~~~~
30-
31-
-
32-
-
33-
34-
.. ---------------------------------------------------------------------------
35-
36-
.. _whatsnew_125.other:
37-
38-
Other
39-
~~~~~
40-
41-
-
42-
-
19+
- Fixed regression in :func:`read_csv` when using ``memory_map=True`` with an non-UTF8 encoding (:issue:`40986`)
20+
- Fixed regression in :meth:`DataFrame.replace` and :meth:`Series.replace` when the values to replace is a NumPy float array (:issue:`40371`)
21+
- Fixed regression in :func:`ExcelFile` when a corrupt file is opened but not closed (:issue:`41778`)
22+
- Fixed regression in :meth:`DataFrame.astype` with ``dtype=str`` failing to convert ``NaN`` in categorical columns (:issue:`41797`)
4323

4424
.. ---------------------------------------------------------------------------
4525

doc/source/whatsnew/v1.3.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -269,12 +269,14 @@ Other enhancements
269269
- :meth:`read_csv` and :meth:`read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
270270
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
271271
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
272+
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising with ``object`` data containing ``pd.NA`` even when ``skipna=True`` (:issue:`37501`)
272273
- :meth:`.GroupBy.rank` now supports object-dtype data (:issue:`38278`)
273274
- Constructing a :class:`DataFrame` or :class:`Series` with the ``data`` argument being a Python iterable that is *not* a NumPy ``ndarray`` consisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case when ``data`` is a NumPy ``ndarray`` (:issue:`40908`)
274275
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`)
275276
- Add keyword ``dropna`` to :meth:`DataFrame.value_counts` to allow counting rows that include ``NA`` values (:issue:`41325`)
276277
- :meth:`Series.replace` will now cast results to ``PeriodDtype`` where possible instead of ``object`` dtype (:issue:`41526`)
277278
- Improved error message in ``corr`` and ``cov`` methods on :class:`.Rolling`, :class:`.Expanding`, and :class:`.ExponentialMovingWindow` when ``other`` is not a :class:`DataFrame` or :class:`Series` (:issue:`41741`)
279+
- :meth:`DataFrame.explode` now supports exploding multiple columns. Its ``column`` argument now also accepts a list of str or tuples for exploding on multiple columns at the same time (:issue:`39240`)
278280

279281
.. ---------------------------------------------------------------------------
280282
@@ -914,6 +916,7 @@ Datetimelike
914916
- Bug in constructing a :class:`DataFrame` or :class:`Series` with mismatched ``datetime64`` data and ``timedelta64`` dtype, or vice-versa, failing to raise a ``TypeError`` (:issue:`38575`, :issue:`38764`, :issue:`38792`)
915917
- Bug in constructing a :class:`Series` or :class:`DataFrame` with a ``datetime`` object out of bounds for ``datetime64[ns]`` dtype or a ``timedelta`` object out of bounds for ``timedelta64[ns]`` dtype (:issue:`38792`, :issue:`38965`)
916918
- Bug in :meth:`DatetimeIndex.intersection`, :meth:`DatetimeIndex.symmetric_difference`, :meth:`PeriodIndex.intersection`, :meth:`PeriodIndex.symmetric_difference` always returning object-dtype when operating with :class:`CategoricalIndex` (:issue:`38741`)
919+
- Bug in :meth:`DatetimeIndex.intersection` giving incorrect results with non-Tick frequencies with ``n != 1`` (:issue:`42104`)
917920
- Bug in :meth:`Series.where` incorrectly casting ``datetime64`` values to ``int64`` (:issue:`37682`)
918921
- Bug in :class:`Categorical` incorrectly typecasting ``datetime`` object to ``Timestamp`` (:issue:`38878`)
919922
- Bug in comparisons between :class:`Timestamp` object and ``datetime64`` objects just outside the implementation bounds for nanosecond ``datetime64`` (:issue:`39221`)

doc/source/whatsnew/v1.4.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Other API changes
9696

9797
Deprecations
9898
~~~~~~~~~~~~
99-
-
99+
- Deprecated :meth:`Index.is_type_compatible` (:issue:`42113`)
100100
-
101101

102102
.. ---------------------------------------------------------------------------

pandas/_libs/src/klib/khash_python.h

+3
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,9 @@ int PANDAS_INLINE tupleobject_cmp(PyTupleObject* a, PyTupleObject* b){
226226

227227

228228
int PANDAS_INLINE pyobject_cmp(PyObject* a, PyObject* b) {
229+
if (a == b) {
230+
return 1;
231+
}
229232
if (Py_TYPE(a) == Py_TYPE(b)) {
230233
// special handling for some built-in types which could have NaNs
231234
// as we would like to have them equivalent, but the usual

pandas/_libs/tslibs/timestamps.pyx

+8-1
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,13 @@ cdef inline object create_timestamp_from_ts(int64_t value,
129129
return ts_base
130130

131131

132+
def _unpickle_timestamp(value, freq, tz):
133+
# GH#41949 dont warn on unpickle if we have a freq
134+
ts = Timestamp(value, tz=tz)
135+
ts._set_freq(freq)
136+
return ts
137+
138+
132139
# ----------------------------------------------------------------------
133140

134141
def integer_op_not_supported(obj):
@@ -725,7 +732,7 @@ cdef class _Timestamp(ABCTimestamp):
725732

726733
def __reduce__(self):
727734
object_state = self.value, self._freq, self.tzinfo
728-
return (Timestamp, object_state)
735+
return (_unpickle_timestamp, object_state)
729736

730737
# -----------------------------------------------------------------
731738
# Rendering Methods

pandas/core/algorithms.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,11 @@ def _ensure_data(values: ArrayLike) -> tuple[np.ndarray, DtypeObj]:
140140
return np.asarray(values).view("uint8"), values.dtype
141141
else:
142142
# i.e. all-bool Categorical, BooleanArray
143-
return np.asarray(values).astype("uint8", copy=False), values.dtype
143+
try:
144+
return np.asarray(values).astype("uint8", copy=False), values.dtype
145+
except TypeError:
146+
# GH#42107 we have pd.NAs present
147+
return np.asarray(values), values.dtype
144148

145149
elif is_integer_dtype(values.dtype):
146150
return np.asarray(values), values.dtype

pandas/core/arrays/categorical.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
NaT,
2727
algos as libalgos,
2828
hashtable as htable,
29+
lib,
2930
)
3031
from pandas._libs.arrays import NDArrayBacked
3132
from pandas._libs.lib import no_default
@@ -523,14 +524,17 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
523524
try:
524525
new_cats = np.asarray(self.categories)
525526
new_cats = new_cats.astype(dtype=dtype, copy=copy)
527+
fill_value = lib.item_from_zerodim(np.array(np.nan).astype(dtype))
526528
except (
527529
TypeError, # downstream error msg for CategoricalIndex is misleading
528530
ValueError,
529531
):
530532
msg = f"Cannot cast {self.categories.dtype} dtype to {dtype}"
531533
raise ValueError(msg)
532534

533-
result = take_nd(new_cats, ensure_platform_int(self._codes))
535+
result = take_nd(
536+
new_cats, ensure_platform_int(self._codes), fill_value=fill_value
537+
)
534538

535539
return result
536540

pandas/core/frame.py

+74-23
Original file line numberDiff line numberDiff line change
@@ -8144,16 +8144,27 @@ def stack(self, level: Level = -1, dropna: bool = True):
81448144

81458145
return result.__finalize__(self, method="stack")
81468146

8147-
def explode(self, column: str | tuple, ignore_index: bool = False) -> DataFrame:
8147+
def explode(
8148+
self,
8149+
column: str | tuple | list[str | tuple],
8150+
ignore_index: bool = False,
8151+
) -> DataFrame:
81488152
"""
81498153
Transform each element of a list-like to a row, replicating index values.
81508154
81518155
.. versionadded:: 0.25.0
81528156
81538157
Parameters
81548158
----------
8155-
column : str or tuple
8156-
Column to explode.
8159+
column : str or tuple or list thereof
8160+
Column(s) to explode.
8161+
For multiple columns, specify a non-empty list with each element
8162+
be str or tuple, and all specified columns their list-like data
8163+
on same row of the frame must have matching length.
8164+
8165+
.. versionadded:: 1.3.0
8166+
Multi-column explode
8167+
81578168
ignore_index : bool, default False
81588169
If True, the resulting index will be labeled 0, 1, …, n - 1.
81598170
@@ -8168,7 +8179,10 @@ def explode(self, column: str | tuple, ignore_index: bool = False) -> DataFrame:
81688179
Raises
81698180
------
81708181
ValueError :
8171-
if columns of the frame are not unique.
8182+
* If columns of the frame are not unique.
8183+
* If specified columns to explode is empty list.
8184+
* If specified columns to explode have not matching count of
8185+
elements rowwise in the frame.
81728186
81738187
See Also
81748188
--------
@@ -8187,32 +8201,69 @@ def explode(self, column: str | tuple, ignore_index: bool = False) -> DataFrame:
81878201
81888202
Examples
81898203
--------
8190-
>>> df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
8204+
>>> df = pd.DataFrame({'A': [[0, 1, 2], 'foo', [], [3, 4]],
8205+
... 'B': 1,
8206+
... 'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
81918207
>>> df
8192-
A B
8193-
0 [1, 2, 3] 1
8194-
1 foo 1
8195-
2 [] 1
8196-
3 [3, 4] 1
8208+
A B C
8209+
0 [0, 1, 2] 1 [a, b, c]
8210+
1 foo 1 NaN
8211+
2 [] 1 []
8212+
3 [3, 4] 1 [d, e]
8213+
8214+
Single-column explode.
81978215
81988216
>>> df.explode('A')
8199-
A B
8200-
0 1 1
8201-
0 2 1
8202-
0 3 1
8203-
1 foo 1
8204-
2 NaN 1
8205-
3 3 1
8206-
3 4 1
8207-
"""
8208-
if not (is_scalar(column) or isinstance(column, tuple)):
8209-
raise ValueError("column must be a scalar")
8217+
A B C
8218+
0 0 1 [a, b, c]
8219+
0 1 1 [a, b, c]
8220+
0 2 1 [a, b, c]
8221+
1 foo 1 NaN
8222+
2 NaN 1 []
8223+
3 3 1 [d, e]
8224+
3 4 1 [d, e]
8225+
8226+
Multi-column explode.
8227+
8228+
>>> df.explode(list('AC'))
8229+
A B C
8230+
0 0 1 a
8231+
0 1 1 b
8232+
0 2 1 c
8233+
1 foo 1 NaN
8234+
2 NaN 1 NaN
8235+
3 3 1 d
8236+
3 4 1 e
8237+
"""
82108238
if not self.columns.is_unique:
82118239
raise ValueError("columns must be unique")
82128240

8241+
columns: list[str | tuple]
8242+
if is_scalar(column) or isinstance(column, tuple):
8243+
assert isinstance(column, (str, tuple))
8244+
columns = [column]
8245+
elif isinstance(column, list) and all(
8246+
map(lambda c: is_scalar(c) or isinstance(c, tuple), column)
8247+
):
8248+
if not column:
8249+
raise ValueError("column must be nonempty")
8250+
if len(column) > len(set(column)):
8251+
raise ValueError("column must be unique")
8252+
columns = column
8253+
else:
8254+
raise ValueError("column must be a scalar, tuple, or list thereof")
8255+
82138256
df = self.reset_index(drop=True)
8214-
result = df[column].explode()
8215-
result = df.drop([column], axis=1).join(result)
8257+
if len(columns) == 1:
8258+
result = df[columns[0]].explode()
8259+
else:
8260+
mylen = lambda x: len(x) if is_list_like(x) else -1
8261+
counts0 = self[columns[0]].apply(mylen)
8262+
for c in columns[1:]:
8263+
if not all(counts0 == self[c].apply(mylen)):
8264+
raise ValueError("columns must have matching element counts")
8265+
result = DataFrame({c: df[c].explode() for c in columns})
8266+
result = df.drop(columns, axis=1).join(result)
82168267
if ignore_index:
82178268
result.index = ibase.default_index(len(result))
82188269
else:

pandas/core/groupby/groupby.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1519,7 +1519,11 @@ def _bool_agg(self, val_test, skipna):
15191519

15201520
def objs_to_bool(vals: ArrayLike) -> tuple[np.ndarray, type]:
15211521
if is_object_dtype(vals):
1522-
vals = np.array([bool(x) for x in vals])
1522+
# GH#37501: don't raise on pd.NA when skipna=True
1523+
if skipna:
1524+
vals = np.array([bool(x) if not isna(x) else True for x in vals])
1525+
else:
1526+
vals = np.array([bool(x) for x in vals])
15231527
elif isinstance(vals, BaseMaskedArray):
15241528
vals = vals._data.astype(bool, copy=False)
15251529
else:

0 commit comments

Comments
 (0)