Skip to content

Commit fbb3f06

Browse files
Merge remote-tracking branch 'upstream/master' into groupby_agg_methods
2 parents d47c819 + d33b002 commit fbb3f06

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+669
-502
lines changed

doc/source/whatsnew/v1.0.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Bug fixes
7474
**I/O**
7575

7676
- Using ``pd.NA`` with :meth:`DataFrame.to_json` now correctly outputs a null value instead of an empty object (:issue:`31615`)
77+
- Fixed pickling of ``pandas.NA``. Previously a new object was returned, which broke computations relying on ``NA`` being a singleton (:issue:`31847`)
7778
- Fixed bug in parquet roundtrip with nullable unsigned integer dtypes (:issue:`31896`).
7879

7980
**Experimental dtypes**

doc/source/whatsnew/v1.1.0.rst

+6
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ Other enhancements
6666
- :class:`Styler` may now render CSS more efficiently where multiple cells have the same styling (:issue:`30876`)
6767
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)
6868
- `OptionError` is now exposed in `pandas.errors` (:issue:`27553`)
69+
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`)
6970
-
7071

7172
.. ---------------------------------------------------------------------------
@@ -198,6 +199,7 @@ Categorical
198199

199200
- Bug where :func:`merge` was unable to join on non-unique categorical indices (:issue:`28189`)
200201
- Bug when passing categorical data to :class:`Index` constructor along with ``dtype=object`` incorrectly returning a :class:`CategoricalIndex` instead of object-dtype :class:`Index` (:issue:`32167`)
202+
- Bug where :class:`Categorical` comparison operator ``__ne__`` would incorrectly evaluate to ``False`` when either element was missing (:issue:`32276`)
201203
-
202204

203205
Datetimelike
@@ -213,6 +215,7 @@ Timedelta
213215
^^^^^^^^^
214216

215217
- Bug in constructing a :class:`Timedelta` with a high precision integer that would round the :class:`Timedelta` components (:issue:`31354`)
218+
- Bug in dividing ``np.nan`` or ``None`` by :class:`Timedelta`` incorrectly returning ``NaT`` (:issue:`31869`)
216219
-
217220

218221
Timezones
@@ -295,8 +298,10 @@ I/O
295298
- Bug in :meth:`DataFrame.to_parquet` overwriting pyarrow's default for
296299
``coerce_timestamps``; following pyarrow's default allows writing nanosecond
297300
timestamps with ``version="2.0"`` (:issue:`31652`).
301+
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`)
298302
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`)
299303

304+
300305
Plotting
301306
^^^^^^^^
302307

@@ -322,6 +327,7 @@ Reshaping
322327
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
323328
- :meth:`DataFrame.pivot` can now take lists for ``index`` and ``columns`` arguments (:issue:`21425`)
324329
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
330+
- :meth:`DataFrame.replace` and :meth:`Series.replace` will raise a ``TypeError`` if ``to_replace`` is not an expected type. Previously the ``replace`` would fail silently (:issue:`18634`)
325331

326332

327333
Sparse

pandas/_libs/lib.pyx

+1-3
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,9 @@ def values_from_object(obj: object):
100100
"""
101101
func: object
102102

103-
if getattr(obj, '_typ', '') == 'dataframe':
104-
return obj.values
105-
106103
func = getattr(obj, '_internal_get_values', None)
107104
if func is not None:
105+
# Includes DataFrame, for which we get frame.values
108106
obj = func()
109107

110108
return obj

pandas/_libs/missing.pyx

+15-18
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,13 @@ cnp.import_array()
1010

1111
cimport pandas._libs.util as util
1212

13-
from pandas._libs.tslibs.np_datetime cimport (
14-
get_timedelta64_value, get_datetime64_value)
13+
14+
from pandas._libs.tslibs.np_datetime cimport get_datetime64_value, get_timedelta64_value
1515
from pandas._libs.tslibs.nattype cimport (
16-
checknull_with_nat, c_NaT as NaT, is_null_datetimelike)
16+
c_NaT as NaT,
17+
checknull_with_nat,
18+
is_null_datetimelike,
19+
)
1720
from pandas._libs.ops_dispatch import maybe_dispatch_ufunc_to_dunder_op
1821

1922
from pandas.compat import is_platform_32bit
@@ -44,7 +47,7 @@ cpdef bint checknull(object val):
4447
4548
Returns
4649
-------
47-
result : bool
50+
bool
4851
4952
Notes
5053
-----
@@ -223,7 +226,7 @@ def isnaobj2d_old(arr: ndarray) -> ndarray:
223226

224227
Returns
225228
-------
226-
result : ndarray (dtype=np.bool_)
229+
ndarray (dtype=np.bool_)
227230

228231
Notes
229232
-----
@@ -248,17 +251,11 @@ def isnaobj2d_old(arr: ndarray) -> ndarray:
248251

249252

250253
def isposinf_scalar(val: object) -> bool:
251-
if util.is_float_object(val) and val == INF:
252-
return True
253-
else:
254-
return False
254+
return util.is_float_object(val) and val == INF
255255

256256

257257
def isneginf_scalar(val: object) -> bool:
258-
if util.is_float_object(val) and val == NEGINF:
259-
return True
260-
else:
261-
return False
258+
return util.is_float_object(val) and val == NEGINF
262259

263260

264261
cdef inline bint is_null_datetime64(v):
@@ -364,6 +361,9 @@ class NAType(C_NAType):
364361
exponent = 31 if is_32bit else 61
365362
return 2 ** exponent - 1
366363

364+
def __reduce__(self):
365+
return "NA"
366+
367367
# Binary arithmetic and comparison ops -> propagate
368368

369369
__add__ = _create_binary_propagating_op("__add__")
@@ -423,7 +423,6 @@ class NAType(C_NAType):
423423
return NA
424424
elif isinstance(other, np.ndarray):
425425
return np.where(other == 1, other, NA)
426-
427426
return NotImplemented
428427

429428
# Logical ops using Kleene logic
@@ -433,8 +432,7 @@ class NAType(C_NAType):
433432
return False
434433
elif other is True or other is C_NA:
435434
return NA
436-
else:
437-
return NotImplemented
435+
return NotImplemented
438436

439437
__rand__ = __and__
440438

@@ -443,8 +441,7 @@ class NAType(C_NAType):
443441
return True
444442
elif other is False or other is C_NA:
445443
return NA
446-
else:
447-
return NotImplemented
444+
return NotImplemented
448445

449446
__ror__ = __or__
450447

pandas/_libs/tslibs/timedeltas.pyx

+9-1
Original file line numberDiff line numberDiff line change
@@ -1407,15 +1407,23 @@ class Timedelta(_Timedelta):
14071407
# convert to Timedelta below
14081408
pass
14091409

1410+
elif util.is_nan(other):
1411+
# i.e. np.nan or np.float64("NaN")
1412+
raise TypeError("Cannot divide float by Timedelta")
1413+
14101414
elif hasattr(other, 'dtype'):
1415+
if other.dtype.kind == "O":
1416+
# GH#31869
1417+
return np.array([x / self for x in other])
14111418
return other / self.to_timedelta64()
14121419

14131420
elif not _validate_ops_compat(other):
14141421
return NotImplemented
14151422

14161423
other = Timedelta(other)
14171424
if other is NaT:
1418-
return NaT
1425+
# In this context we treat NaT as timedelta-like
1426+
return np.nan
14191427
return float(other.value) / self.value
14201428

14211429
def __floordiv__(self, other):

pandas/_testing.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -742,9 +742,9 @@ def repr_class(x):
742742
raise_assert_detail(obj, msg, repr_class(left), repr_class(right))
743743

744744

745-
def assert_attr_equal(attr, left, right, obj="Attributes"):
745+
def assert_attr_equal(attr: str, left, right, obj: str = "Attributes"):
746746
"""
747-
checks attributes are equal. Both objects must have attribute.
747+
Check attributes are equal. Both objects must have attribute.
748748
749749
Parameters
750750
----------

pandas/_typing.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
Label = Optional[Hashable]
6565
Level = Union[Label, int]
6666
Ordered = Optional[bool]
67-
JSONSerializable = Union[PythonScalar, List, Dict]
67+
JSONSerializable = Optional[Union[PythonScalar, List, Dict]]
6868
Axes = Collection
6969

7070
# For functions like rename that convert one label to another

pandas/conftest.py

+10
Original file line numberDiff line numberDiff line change
@@ -1047,6 +1047,16 @@ def series_with_simple_index(indices):
10471047
for dtype in _narrow_dtypes
10481048
}
10491049

1050+
1051+
@pytest.fixture(params=_narrow_series.keys())
1052+
def narrow_series(request):
1053+
"""
1054+
Fixture for Series with low precision data types
1055+
"""
1056+
# copy to avoid mutation, e.g. setting .name
1057+
return _narrow_series[request.param].copy()
1058+
1059+
10501060
_index_or_series_objs = {**indices_dict, **_series, **_narrow_series}
10511061

10521062

pandas/core/algorithms.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -313,8 +313,8 @@ def unique(values):
313313
314314
See Also
315315
--------
316-
Index.unique
317-
Series.unique
316+
Index.unique : Return unique values from an Index.
317+
Series.unique : Return unique values of Series object.
318318
319319
Examples
320320
--------
@@ -1515,7 +1515,7 @@ def take(arr, indices, axis: int = 0, allow_fill: bool = False, fill_value=None)
15151515
15161516
See Also
15171517
--------
1518-
numpy.take
1518+
numpy.take : Take elements from an array along an axis.
15191519
15201520
Examples
15211521
--------

pandas/core/arrays/categorical.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,10 @@ def func(self, other):
103103
mask = (self._codes == -1) | (other_codes == -1)
104104
if mask.any():
105105
# In other series, the leads to False, so do that here too
106-
ret[mask] = False
106+
if opname == "__ne__":
107+
ret[(self._codes == -1) & (other_codes == -1)] = True
108+
else:
109+
ret[mask] = False
107110
return ret
108111

109112
if is_scalar(other):

pandas/core/arrays/datetimelike.py

+8-13
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
from pandas.core.algorithms import checked_add_with_arr, take, unique1d, value_counts
4343
from pandas.core.arrays.base import ExtensionArray, ExtensionOpsMixin
4444
import pandas.core.common as com
45+
from pandas.core.construction import array, extract_array
4546
from pandas.core.indexers import check_array_indexer
4647
from pandas.core.ops.common import unpack_zerodim_and_defer
4748
from pandas.core.ops.invalid import invalid_comparison, make_invalid_op
@@ -623,7 +624,7 @@ def astype(self, dtype, copy=True):
623624
dtype = pandas_dtype(dtype)
624625

625626
if is_object_dtype(dtype):
626-
return self._box_values(self.asi8)
627+
return self._box_values(self.asi8.ravel()).reshape(self.shape)
627628
elif is_string_dtype(dtype) and not is_categorical_dtype(dtype):
628629
return self._format_native_types()
629630
elif is_integer_dtype(dtype):
@@ -1256,19 +1257,13 @@ def _addsub_object_array(self, other: np.ndarray, op):
12561257
PerformanceWarning,
12571258
)
12581259

1259-
# For EA self.astype('O') returns a numpy array, not an Index
1260-
left = self.astype("O")
1260+
# Caller is responsible for broadcasting if necessary
1261+
assert self.shape == other.shape, (self.shape, other.shape)
12611262

1262-
res_values = op(left, np.array(other))
1263-
kwargs = {}
1264-
if not is_period_dtype(self):
1265-
kwargs["freq"] = "infer"
1266-
try:
1267-
res = type(self)._from_sequence(res_values, **kwargs)
1268-
except ValueError:
1269-
# e.g. we've passed a Timestamp to TimedeltaArray
1270-
res = res_values
1271-
return res
1263+
res_values = op(self.astype("O"), np.array(other))
1264+
result = array(res_values.ravel())
1265+
result = extract_array(result, extract_numpy=True).reshape(self.shape)
1266+
return result
12721267

12731268
def _time_shift(self, periods, freq=None):
12741269
"""

pandas/core/arrays/timedeltas.py

+10-2
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
from pandas.core.algorithms import checked_add_with_arr
4040
from pandas.core.arrays import datetimelike as dtl
4141
import pandas.core.common as com
42+
from pandas.core.construction import extract_array
4243

4344
from pandas.tseries.frequencies import to_offset
4445
from pandas.tseries.offsets import Tick
@@ -141,8 +142,7 @@ def dtype(self):
141142
# Constructors
142143

143144
def __init__(self, values, dtype=_TD_DTYPE, freq=None, copy=False):
144-
if isinstance(values, (ABCSeries, ABCIndexClass)):
145-
values = values._values
145+
values = extract_array(values)
146146

147147
inferred_freq = getattr(values, "_freq", None)
148148

@@ -258,6 +258,10 @@ def _generate_range(cls, start, end, periods, freq, closed=None):
258258
index = _generate_regular_range(start, end, periods, freq)
259259
else:
260260
index = np.linspace(start.value, end.value, periods).astype("i8")
261+
if len(index) >= 2:
262+
# Infer a frequency
263+
td = Timedelta(index[1] - index[0])
264+
freq = to_offset(td)
261265

262266
if not left_closed:
263267
index = index[1:]
@@ -614,6 +618,10 @@ def __floordiv__(self, other):
614618
if self.freq is not None:
615619
# Note: freq gets division, not floor-division
616620
freq = self.freq / other
621+
if freq.nanos == 0 and self.freq.nanos != 0:
622+
# e.g. if self.freq is Nano(1) then dividing by 2
623+
# rounds down to zero
624+
freq = None
617625
return type(self)(result.view("m8[ns]"), freq=freq)
618626

619627
if not hasattr(other, "dtype"):

pandas/core/base.py

+8-5
Original file line numberDiff line numberDiff line change
@@ -1035,7 +1035,8 @@ def argmin(self, axis=None, skipna=True, *args, **kwargs):
10351035
10361036
See Also
10371037
--------
1038-
numpy.ndarray.argmin
1038+
numpy.ndarray.argmin : Return indices of the minimum values along
1039+
the given axis.
10391040
"""
10401041
nv.validate_minmax_axis(axis)
10411042
nv.validate_argmax_with_skipna(skipna, args, kwargs)
@@ -1055,7 +1056,8 @@ def tolist(self):
10551056
10561057
See Also
10571058
--------
1058-
numpy.ndarray.tolist
1059+
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
1060+
nested list of Python scalars.
10591061
"""
10601062
if not isinstance(self._values, np.ndarray):
10611063
# check for ndarray instead of dtype to catch DTA/TDA
@@ -1402,7 +1404,8 @@ def memory_usage(self, deep=False):
14021404
14031405
See Also
14041406
--------
1405-
numpy.ndarray.nbytes
1407+
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
1408+
array.
14061409
14071410
Notes
14081411
-----
@@ -1473,8 +1476,8 @@ def factorize(self, sort=False, na_sentinel=-1):
14731476
14741477
See Also
14751478
--------
1476-
sort_values
1477-
numpy.searchsorted
1479+
sort_values : Sort by the values along either axis.
1480+
numpy.searchsorted : Similar method from NumPy.
14781481
14791482
Notes
14801483
-----

pandas/core/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ def is_bool_indexer(key: Any) -> bool:
122122
is_array_like(key) and is_extension_array_dtype(key.dtype)
123123
):
124124
if key.dtype == np.object_:
125-
key = np.asarray(values_from_object(key))
125+
key = np.asarray(key)
126126

127127
if not lib.is_bool_array(key):
128128
na_msg = "Cannot mask with non-boolean array containing NA / NaN values"

pandas/core/computation/eval.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -266,8 +266,10 @@ def eval(
266266
267267
See Also
268268
--------
269-
DataFrame.query
270-
DataFrame.eval
269+
DataFrame.query : Evaluates a boolean expression to query the columns
270+
of a frame.
271+
DataFrame.eval : Evaluate a string describing operations on
272+
DataFrame columns.
271273
272274
Notes
273275
-----

0 commit comments

Comments
 (0)