Skip to content

Commit b381e8a

Browse files
committed
Merge branch 'master' into refactor/csvs
2 parents bebdfcf + bdb6e26 commit b381e8a

File tree

103 files changed

+732
-667
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

103 files changed

+732
-667
lines changed

ci/code_checks.sh

+15-4
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
179179
invgrep -R --include="*.py" -E "super\(\w*, (self|cls)\)" pandas
180180
RET=$(($RET + $?)) ; echo $MSG "DONE"
181181

182+
MSG='Check for use of builtin filter function' ; echo $MSG
183+
invgrep -R --include="*.py" -P '(?<!def)[\(\s]filter\(' pandas
184+
RET=$(($RET + $?)) ; echo $MSG "DONE"
185+
182186
# Check for the following code in testing: `np.testing` and `np.array_equal`
183187
MSG='Check for invalid testing' ; echo $MSG
184188
invgrep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/
@@ -226,15 +230,22 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
226230
invgrep -R --include=*.{py,pyx} '!r}' pandas
227231
RET=$(($RET + $?)) ; echo $MSG "DONE"
228232

233+
# -------------------------------------------------------------------------
234+
# Type annotations
235+
229236
MSG='Check for use of comment-based annotation syntax' ; echo $MSG
230237
invgrep -R --include="*.py" -P '# type: (?!ignore)' pandas
231238
RET=$(($RET + $?)) ; echo $MSG "DONE"
232239

233-
# https://github.com/python/mypy/issues/7384
234-
# MSG='Check for missing error codes with # type: ignore' ; echo $MSG
235-
# invgrep -R --include="*.py" -P '# type: ignore(?!\[)' pandas
236-
# RET=$(($RET + $?)) ; echo $MSG "DONE"
240+
MSG='Check for missing error codes with # type: ignore' ; echo $MSG
241+
invgrep -R --include="*.py" -P '# type:\s?ignore(?!\[)' pandas
242+
RET=$(($RET + $?)) ; echo $MSG "DONE"
243+
244+
MSG='Check for use of Union[Series, DataFrame] instead of FrameOrSeriesUnion alias' ; echo $MSG
245+
invgrep -R --include="*.py" --exclude=_typing.py -E 'Union\[.*(Series.*DataFrame|DataFrame.*Series).*\]' pandas
246+
RET=$(($RET + $?)) ; echo $MSG "DONE"
237247

248+
# -------------------------------------------------------------------------
238249
MSG='Check for use of foo.__class__ instead of type(foo)' ; echo $MSG
239250
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
240251
RET=$(($RET + $?)) ; echo $MSG "DONE"

doc/source/user_guide/missing_data.rst

-26
Original file line numberDiff line numberDiff line change
@@ -689,32 +689,6 @@ You can also operate on the DataFrame in place:
689689
690690
df.replace(1.5, np.nan, inplace=True)
691691
692-
.. warning::
693-
694-
When replacing multiple ``bool`` or ``datetime64`` objects, the first
695-
argument to ``replace`` (``to_replace``) must match the type of the value
696-
being replaced. For example,
697-
698-
.. code-block:: python
699-
700-
>>> s = pd.Series([True, False, True])
701-
>>> s.replace({'a string': 'new value', True: False}) # raises
702-
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'
703-
704-
will raise a ``TypeError`` because one of the ``dict`` keys is not of the
705-
correct type for replacement.
706-
707-
However, when replacing a *single* object such as,
708-
709-
.. ipython:: python
710-
711-
s = pd.Series([True, False, True])
712-
s.replace('a string', 'another string')
713-
714-
the original ``NDFrame`` object will be returned untouched. We're working on
715-
unifying this API, but for backwards compatibility reasons we cannot break
716-
the latter behavior. See :issue:`6354` for more details.
717-
718692
Missing data casting rules and indexing
719693
---------------------------------------
720694

doc/source/whatsnew/v1.1.2.rst

+2
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717
- Regression in :meth:`DatetimeIndex.intersection` incorrectly raising ``AssertionError`` when intersecting against a list (:issue:`35876`)
1818
- Fix regression in updating a column inplace (e.g. using ``df['col'].fillna(.., inplace=True)``) (:issue:`35731`)
1919
- Performance regression for :meth:`RangeIndex.format` (:issue:`35712`)
20+
- Fix regression in invalid cache after an indexing operation; this can manifest when setting which does not update the data (:issue:`35521`)
2021
- Regression in :meth:`DataFrame.replace` where a ``TypeError`` would be raised when attempting to replace elements of type :class:`Interval` (:issue:`35931`)
2122
- Fix regression in pickle roundtrip of the ``closed`` attribute of :class:`IntervalIndex` (:issue:`35658`)
2223
- Fixed regression in :meth:`DataFrameGroupBy.agg` where a ``ValueError: buffer source array is read-only`` would be raised when the underlying array is read-only (:issue:`36014`)
@@ -33,6 +34,7 @@ Bug fixes
3334
- Bug in :meth:`DataFrame.apply` with ``result_type="reduce"`` returning with incorrect index (:issue:`35683`)
3435
- Bug in :meth:`DateTimeIndex.format` and :meth:`PeriodIndex.format` with ``name=True`` setting the first item to ``"None"`` where it should be ``""`` (:issue:`35712`)
3536
- Bug in :meth:`Float64Index.__contains__` incorrectly raising ``TypeError`` instead of returning ``False`` (:issue:`35788`)
37+
- Bug in :meth:`Series.dt.isocalendar` and :meth:`DatetimeIndex.isocalendar` that returned incorrect year for certain dates (:issue:`36032`)
3638
- Bug in :class:`DataFrame` indexing returning an incorrect :class:`Series` in some cases when the series has been altered and a cache not invalidated (:issue:`33675`)
3739

3840
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.2.0.rst

+6-2
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ Performance improvements
214214

215215
Bug fixes
216216
~~~~~~~~~
217-
217+
- Bug in :meth:`DataFrameGroupBy.apply` raising error with ``np.nan`` group(s) when ``dropna=False`` (:issue:`35889`)
218+
-
218219

219220
Categorical
220221
^^^^^^^^^^^
@@ -294,11 +295,12 @@ I/O
294295
- :meth:`to_csv` passes compression arguments for `'gzip'` always to `gzip.GzipFile` (:issue:`28103`)
295296
- :meth:`to_csv` did not support zip compression for binary file object not having a filename (:issue: `35058`)
296297
- :meth:`to_csv` and :meth:`read_csv` did not honor `compression` and `encoding` for path-like objects that are internally converted to file-like objects (:issue:`35677`, :issue:`26124`, and :issue:`32392`)
298+
- :meth:`to_picke` and :meth:`read_pickle` did not support compression for file-objects (:issue:`26237`, :issue:`29054`, and :issue:`29570`)
297299

298300
Plotting
299301
^^^^^^^^
300302

301-
-
303+
- Bug in :meth:`DataFrame.plot` where a marker letter in the ``style`` keyword sometimes causes a ``ValueError`` (:issue:`21003`)
302304
-
303305

304306
Groupby/resample/rolling
@@ -311,6 +313,7 @@ Groupby/resample/rolling
311313
- Bug in :meth:`DataFrameGroupby.apply` would drop a :class:`CategoricalIndex` when grouped on. (:issue:`35792`)
312314
- Bug when subsetting columns on a :class:`~pandas.core.groupby.DataFrameGroupBy` (e.g. ``df.groupby('a')[['b']])``) would reset the attributes ``axis``, ``dropna``, ``group_keys``, ``level``, ``mutated``, ``sort``, and ``squeeze`` to their default values. (:issue:`9959`)
313315
- Bug in :meth:`DataFrameGroupby.tshift` failing to raise ``ValueError`` when a frequency cannot be inferred for the index of a group (:issue:`35937`)
316+
- Bug in :meth:`DataFrame.groupby` does not always maintain column index name for ``any``, ``all``, ``bfill``, ``ffill``, ``shift`` (:issue:`29764`)
314317
-
315318

316319
Reshaping
@@ -337,6 +340,7 @@ ExtensionArray
337340
Other
338341
^^^^^
339342
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
343+
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
340344
-
341345

342346
.. ---------------------------------------------------------------------------

pandas/_libs/indexing.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
cdef class _NDFrameIndexerBase:
1+
cdef class NDFrameIndexerBase:
22
"""
33
A base class for _NDFrameIndexer for fast instantiation and attribute access.
44
"""

pandas/_libs/tslibs/ccalendar.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -201,10 +201,10 @@ cpdef iso_calendar_t get_iso_calendar(int year, int month, int day) nogil:
201201
iso_week = 1
202202

203203
iso_year = year
204-
if iso_week == 1 and doy > 7:
204+
if iso_week == 1 and month == 12:
205205
iso_year += 1
206206

207-
elif iso_week >= 52 and doy < 7:
207+
elif iso_week >= 52 and month == 1:
208208
iso_year -= 1
209209

210210
return iso_year, iso_week, dow + 1

pandas/_libs/tslibs/parsing.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@ class _timelex:
771771
_DATEUTIL_LEXER_SPLIT = _timelex.split
772772

773773

774-
def _format_is_iso(f) -> bint:
774+
def format_is_iso(f: str) -> bint:
775775
"""
776776
Does format match the iso8601 set that can be handled by the C parser?
777777
Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different
@@ -789,7 +789,7 @@ def _format_is_iso(f) -> bint:
789789
return False
790790

791791

792-
def _guess_datetime_format(
792+
def guess_datetime_format(
793793
dt_str,
794794
bint dayfirst=False,
795795
dt_str_parse=du_parse,

pandas/_typing.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
# other
6464

6565
Dtype = Union[
66-
"ExtensionDtype", str, np.dtype, Type[Union[str, float, int, complex, bool]]
66+
"ExtensionDtype", str, np.dtype, Type[Union[str, float, int, complex, bool, object]]
6767
]
6868
DtypeObj = Union[np.dtype, "ExtensionDtype"]
6969
FilePathOrBuffer = Union[str, Path, IO[AnyStr], IOBase]
@@ -118,7 +118,7 @@
118118

119119

120120
# compression keywords and compression
121-
CompressionDict = Mapping[str, Optional[Union[str, int, bool]]]
121+
CompressionDict = Dict[str, Any]
122122
CompressionOptions = Optional[Union[str, CompressionDict]]
123123

124124

@@ -140,6 +140,6 @@ class IOargs(Generic[ModeVar, EncodingVar]):
140140

141141
filepath_or_buffer: FileOrBuffer
142142
encoding: EncodingVar
143-
compression: CompressionOptions
143+
compression: CompressionDict
144144
should_close: bool
145145
mode: Union[ModeVar, str]

pandas/core/algorithms.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
import operator
88
from textwrap import dedent
9-
from typing import TYPE_CHECKING, Dict, Optional, Tuple, Union
9+
from typing import TYPE_CHECKING, Dict, Optional, Tuple, Union, cast
1010
from warnings import catch_warnings, simplefilter, warn
1111

1212
import numpy as np
@@ -60,7 +60,7 @@
6060
from pandas.core.indexers import validate_indices
6161

6262
if TYPE_CHECKING:
63-
from pandas import DataFrame, Series
63+
from pandas import Categorical, DataFrame, Series
6464

6565
_shared_docs: Dict[str, str] = {}
6666

@@ -429,8 +429,7 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
429429
if is_categorical_dtype(comps):
430430
# TODO(extension)
431431
# handle categoricals
432-
# error: "ExtensionArray" has no attribute "isin" [attr-defined]
433-
return comps.isin(values) # type: ignore[attr-defined]
432+
return cast("Categorical", comps).isin(values)
434433

435434
comps, dtype = _ensure_data(comps)
436435
values, _ = _ensure_data(values, dtype=dtype)

pandas/core/apply.py

+7-9
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
import abc
22
import inspect
3-
from typing import TYPE_CHECKING, Any, Dict, Iterator, Optional, Tuple, Type, Union
3+
from typing import TYPE_CHECKING, Any, Dict, Iterator, Optional, Tuple, Type
44

55
import numpy as np
66

77
from pandas._config import option_context
88

9-
from pandas._typing import Axis
9+
from pandas._typing import Axis, FrameOrSeriesUnion
1010
from pandas.util._decorators import cache_readonly
1111

1212
from pandas.core.dtypes.common import is_dict_like, is_list_like, is_sequence
@@ -73,7 +73,7 @@ def series_generator(self) -> Iterator["Series"]:
7373
@abc.abstractmethod
7474
def wrap_results_for_axis(
7575
self, results: ResType, res_index: "Index"
76-
) -> Union["Series", "DataFrame"]:
76+
) -> FrameOrSeriesUnion:
7777
pass
7878

7979
# ---------------------------------------------------------------
@@ -289,9 +289,7 @@ def apply_series_generator(self) -> Tuple[ResType, "Index"]:
289289

290290
return results, res_index
291291

292-
def wrap_results(
293-
self, results: ResType, res_index: "Index"
294-
) -> Union["Series", "DataFrame"]:
292+
def wrap_results(self, results: ResType, res_index: "Index") -> FrameOrSeriesUnion:
295293
from pandas import Series
296294

297295
# see if we can infer the results
@@ -335,7 +333,7 @@ def result_columns(self) -> "Index":
335333

336334
def wrap_results_for_axis(
337335
self, results: ResType, res_index: "Index"
338-
) -> Union["Series", "DataFrame"]:
336+
) -> FrameOrSeriesUnion:
339337
""" return the results for the rows """
340338

341339
if self.result_type == "reduce":
@@ -408,9 +406,9 @@ def result_columns(self) -> "Index":
408406

409407
def wrap_results_for_axis(
410408
self, results: ResType, res_index: "Index"
411-
) -> Union["Series", "DataFrame"]:
409+
) -> FrameOrSeriesUnion:
412410
""" return the results for the columns """
413-
result: Union["Series", "DataFrame"]
411+
result: FrameOrSeriesUnion
414412

415413
# we have requested to expand
416414
if self.result_type == "expand":

pandas/core/array_algos/replace.py

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
"""
2+
Methods used by Block.replace and related methods.
3+
"""
4+
import operator
5+
import re
6+
from typing import Optional, Pattern, Union
7+
8+
import numpy as np
9+
10+
from pandas._typing import ArrayLike, Scalar
11+
12+
from pandas.core.dtypes.common import (
13+
is_datetimelike_v_numeric,
14+
is_numeric_v_string_like,
15+
is_scalar,
16+
)
17+
from pandas.core.dtypes.missing import isna
18+
19+
20+
def compare_or_regex_search(
21+
a: ArrayLike,
22+
b: Union[Scalar, Pattern],
23+
regex: bool = False,
24+
mask: Optional[ArrayLike] = None,
25+
) -> Union[ArrayLike, bool]:
26+
"""
27+
Compare two array_like inputs of the same shape or two scalar values
28+
29+
Calls operator.eq or re.search, depending on regex argument. If regex is
30+
True, perform an element-wise regex matching.
31+
32+
Parameters
33+
----------
34+
a : array_like
35+
b : scalar or regex pattern
36+
regex : bool, default False
37+
mask : array_like or None (default)
38+
39+
Returns
40+
-------
41+
mask : array_like of bool
42+
"""
43+
44+
def _check_comparison_types(
45+
result: Union[ArrayLike, bool], a: ArrayLike, b: Union[Scalar, Pattern]
46+
):
47+
"""
48+
Raises an error if the two arrays (a,b) cannot be compared.
49+
Otherwise, returns the comparison result as expected.
50+
"""
51+
if is_scalar(result) and isinstance(a, np.ndarray):
52+
type_names = [type(a).__name__, type(b).__name__]
53+
54+
if isinstance(a, np.ndarray):
55+
type_names[0] = f"ndarray(dtype={a.dtype})"
56+
57+
raise TypeError(
58+
f"Cannot compare types {repr(type_names[0])} and {repr(type_names[1])}"
59+
)
60+
61+
if not regex:
62+
op = lambda x: operator.eq(x, b)
63+
else:
64+
op = np.vectorize(
65+
lambda x: bool(re.search(b, x))
66+
if isinstance(x, str) and isinstance(b, (str, Pattern))
67+
else False
68+
)
69+
70+
# GH#32621 use mask to avoid comparing to NAs
71+
if mask is None and isinstance(a, np.ndarray) and not isinstance(b, np.ndarray):
72+
mask = np.reshape(~(isna(a)), a.shape)
73+
if isinstance(a, np.ndarray):
74+
a = a[mask]
75+
76+
if is_numeric_v_string_like(a, b):
77+
# GH#29553 avoid deprecation warnings from numpy
78+
return np.zeros(a.shape, dtype=bool)
79+
80+
elif is_datetimelike_v_numeric(a, b):
81+
# GH#29553 avoid deprecation warnings from numpy
82+
_check_comparison_types(False, a, b)
83+
return False
84+
85+
result = op(a)
86+
87+
if isinstance(result, np.ndarray) and mask is not None:
88+
# The shape of the mask can differ to that of the result
89+
# since we may compare only a subset of a's or b's elements
90+
tmp = np.zeros(mask.shape, dtype=np.bool_)
91+
tmp[mask] = result
92+
result = tmp
93+
94+
_check_comparison_types(result, a, b)
95+
return result

pandas/core/arrays/categorical.py

+14-1
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,19 @@ class Categorical(NDArrayBackedExtensionArray, PandasObject):
280280
['a', 'b', 'c', 'a', 'b', 'c']
281281
Categories (3, object): ['a', 'b', 'c']
282282
283+
Missing values are not included as a category.
284+
285+
>>> c = pd.Categorical([1, 2, 3, 1, 2, 3, np.nan])
286+
>>> c
287+
[1, 2, 3, 1, 2, 3, NaN]
288+
Categories (3, int64): [1, 2, 3]
289+
290+
However, their presence is indicated in the `codes` attribute
291+
by code `-1`.
292+
293+
>>> c.codes
294+
array([ 0, 1, 2, 0, 1, 2, -1], dtype=int8)
295+
283296
Ordered `Categoricals` can be sorted according to the custom order
284297
of the categories and can have a min and max value.
285298
@@ -2316,7 +2329,7 @@ def _concat_same_type(self, to_concat):
23162329

23172330
return union_categoricals(to_concat)
23182331

2319-
def isin(self, values):
2332+
def isin(self, values) -> np.ndarray:
23202333
"""
23212334
Check whether `values` are contained in Categorical.
23222335

0 commit comments

Comments
 (0)