Skip to content

Commit 1824b99

Browse files
committed
Merge remote-tracking branch 'upstream/master' into Rt05
* upstream/master: BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid (pandas-dev#25246) API: Ensure DatetimeTZDtype standardizes pytz timezones (pandas-dev#25254) Split Excel IO Into Sub-Directory (pandas-dev#25153) PR04 errors fix (pandas-dev#25157) DEPR: remove assert_panel_equal (pandas-dev#25238) BUG: pandas Timestamp tz_localize and tz_convert do not preserve `freq` attribute (pandas-dev#25247) Revert "BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237)" (pandas-dev#25253) REF/TST: resample/test_base.py (pandas-dev#25262) BUG: Duplicated returns boolean dataframe (pandas-dev#25234) CLN: Remove ipython 2.x compat (pandas-dev#25150) Refactor groupby group_add from tempita to fused types (pandas-dev#24954) CLN: For loops, boolean conditions, misc. (pandas-dev#25206) (Closes pandas-dev#25029) Removed extra bracket from cheatsheet code example. (pandas-dev#25032) BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237) BUG: Fix read_json orient='table' without index (pandas-dev#25170) (pandas-dev#25171) BUG: Fix regression in DataFrame.apply causing RecursionError (pandas-dev#25230) BUG-25061 fix printing indices with NaNs (pandas-dev#25202) DEPR: Add Deprecated warning for timedelta with passed units M and Y (pandas-dev#23264) DEPR: Remove Panel-specific parts of io.pytables (pandas-dev#25233) DEPR: remove tm.makePanel and all usages (pandas-dev#25231)
2 parents 25e7503 + ea1d5f5 commit 1824b99

File tree

91 files changed

+2536
-4887
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+2536
-4887
lines changed

asv_bench/benchmarks/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Pandas benchmarks."""

ci/code_checks.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,8 @@ fi
241241
### DOCSTRINGS ###
242242
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
243243

244-
MSG='Validate docstrings (GL06, GL07, GL09, SS04, SS05, PR03, PR05, EX04, RT04, RT05, SA05)' ; echo $MSG
245-
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,SS05,PR03,PR05,EX04,RT04,RT05,SA05
244+
MSG='Validate docstrings (GL06, GL07, GL09, SS04, SS05, PR03, PR04, PR05, EX04, RT04, RT05, SA05)' ; echo $MSG
245+
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,SS05,PR03,PR04,PR05,EX04,RT04,RT05,SA05
246246
RET=$(($RET + $?)) ; echo $MSG "DONE"
247247

248248
fi

doc/cheatsheet/Pandas_Cheat_Sheet.pdf

6.7 KB
Binary file not shown.
-261 Bytes
Binary file not shown.
210 KB
Binary file not shown.
5.73 KB
Binary file not shown.

doc/source/user_guide/timeseries.rst

+24
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,15 @@ which can be specified. These are computed from the starting point specified by
321321
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
322322
1349720105400, 1349720105500], unit='ms')
323323
324+
Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp
325+
with the ``tz`` argument specified will localize the epoch timestamps to UTC
326+
first then convert the result to the specified time zone.
327+
328+
.. ipython:: python
329+
330+
pd.Timestamp(1262347200000000000, tz='US/Pacific')
331+
pd.DatetimeIndex([1262347200000000000], tz='US/Pacific')
332+
324333
.. note::
325334

326335
Epoch times will be rounded to the nearest nanosecond.
@@ -2205,6 +2214,21 @@ you can use the ``tz_convert`` method.
22052214
22062215
rng_pytz.tz_convert('US/Eastern')
22072216
2217+
.. note::
2218+
2219+
When using ``pytz`` time zones, :class:`DatetimeIndex` will construct a different
2220+
time zone object than a :class:`Timestamp` for the same time zone input. A :class:`DatetimeIndex`
2221+
can hold a collection of :class:`Timestamp` objects that may have different UTC offsets and cannot be
2222+
succinctly represented by one ``pytz`` time zone instance while one :class:`Timestamp`
2223+
represents one point in time with a specific UTC offset.
2224+
2225+
.. ipython:: python
2226+
2227+
dti = pd.date_range('2019-01-01', periods=3, freq='D', tz='US/Pacific')
2228+
dti.tz
2229+
ts = pd.Timestamp('2019-01-01', tz='US/Pacific')
2230+
ts.tz
2231+
22082232
.. warning::
22092233

22102234
Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different

doc/source/whatsnew/v0.24.2.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ Fixed Regressions
2121
^^^^^^^^^^^^^^^^^
2222

2323
- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`)
24-
2524
- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`)
25+
- Fixed regression in :meth:`DataFrame.apply` causing ``RecursionError`` when ``dict``-like classes were passed as argument. (:issue:`25196`)
26+
27+
- Fixed regression in :meth:`DataFrame.duplicated()`, where empty dataframe was not returning a boolean dtyped Series. (:issue:`25184`)
2628

2729
.. _whatsnew_0242.enhancements:
2830

@@ -52,7 +54,8 @@ Bug Fixes
5254
**I/O**
5355

5456
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`)
55-
-
57+
- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`)
58+
- Bug where float indexes could have misaligned values when printing (:issue:`25061`)
5659
-
5760

5861
**Categorical**

doc/source/whatsnew/v0.25.0.rst

+7-8
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Backwards incompatible API changes
3333
Other API Changes
3434
^^^^^^^^^^^^^^^^^
3535

36-
-
36+
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
3737
-
3838
-
3939

@@ -42,16 +42,13 @@ Other API Changes
4242
Deprecations
4343
~~~~~~~~~~~~
4444

45-
-
46-
-
47-
-
48-
45+
- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)
4946

5047
.. _whatsnew_0250.prior_deprecations:
5148

5249
Removal of prior version deprecations/changes
5350
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54-
- Removed (parts of) :class:`Panel` (:issue:`25047`)
51+
- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:`25191`,:issue:`25231`)
5552
-
5653
-
5754
-
@@ -71,6 +68,8 @@ Performance Improvements
7168
Bug Fixes
7269
~~~~~~~~~
7370

71+
-
72+
7473
Categorical
7574
^^^^^^^^^^^
7675

@@ -96,7 +95,7 @@ Timezones
9695
^^^^^^^^^
9796

9897
- Bug in :func:`to_datetime` with ``utc=True`` and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`)
99-
-
98+
- Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate ``freq`` (:issue:`25241`)
10099
-
101100

102101
Numeric
@@ -142,7 +141,7 @@ Indexing
142141
Missing
143142
^^^^^^^
144143

145-
-
144+
- Fixed misleading exception message in :meth:`Series.missing` if argument ``order`` is required, but omitted (:issue:`10633`, :issue:`24014`).
146145
-
147146
-
148147

pandas/_libs/groupby.pyx

+51
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import cython
44
from cython import Py_ssize_t
5+
from cython cimport floating
56

67
from libc.stdlib cimport malloc, free
78

@@ -382,5 +383,55 @@ def group_any_all(uint8_t[:] out,
382383
out[lab] = flag_val
383384

384385

386+
@cython.wraparound(False)
387+
@cython.boundscheck(False)
388+
def _group_add(floating[:, :] out,
389+
int64_t[:] counts,
390+
floating[:, :] values,
391+
const int64_t[:] labels,
392+
Py_ssize_t min_count=0):
393+
"""
394+
Only aggregates on axis=0
395+
"""
396+
cdef:
397+
Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
398+
floating val, count
399+
ndarray[floating, ndim=2] sumx, nobs
400+
401+
if not len(values) == len(labels):
402+
raise AssertionError("len(index) != len(labels)")
403+
404+
nobs = np.zeros_like(out)
405+
sumx = np.zeros_like(out)
406+
407+
N, K = (<object>values).shape
408+
409+
with nogil:
410+
411+
for i in range(N):
412+
lab = labels[i]
413+
if lab < 0:
414+
continue
415+
416+
counts[lab] += 1
417+
for j in range(K):
418+
val = values[i, j]
419+
420+
# not nan
421+
if val == val:
422+
nobs[lab, j] += 1
423+
sumx[lab, j] += val
424+
425+
for i in range(ncounts):
426+
for j in range(K):
427+
if nobs[i, j] < min_count:
428+
out[i, j] = NAN
429+
else:
430+
out[i, j] = sumx[i, j]
431+
432+
433+
group_add_float32 = _group_add['float']
434+
group_add_float64 = _group_add['double']
435+
385436
# generated from template
386437
include "groupby_helper.pxi"

pandas/_libs/groupby_helper.pxi.in

+1-48
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ cdef extern from "numpy/npy_math.h":
99
_int64_max = np.iinfo(np.int64).max
1010

1111
# ----------------------------------------------------------------------
12-
# group_add, group_prod, group_var, group_mean, group_ohlc
12+
# group_prod, group_var, group_mean, group_ohlc
1313
# ----------------------------------------------------------------------
1414

1515
{{py:
@@ -27,53 +27,6 @@ def get_dispatch(dtypes):
2727
{{for name, c_type in get_dispatch(dtypes)}}
2828

2929

30-
@cython.wraparound(False)
31-
@cython.boundscheck(False)
32-
def group_add_{{name}}({{c_type}}[:, :] out,
33-
int64_t[:] counts,
34-
{{c_type}}[:, :] values,
35-
const int64_t[:] labels,
36-
Py_ssize_t min_count=0):
37-
"""
38-
Only aggregates on axis=0
39-
"""
40-
cdef:
41-
Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
42-
{{c_type}} val, count
43-
ndarray[{{c_type}}, ndim=2] sumx, nobs
44-
45-
if not len(values) == len(labels):
46-
raise AssertionError("len(index) != len(labels)")
47-
48-
nobs = np.zeros_like(out)
49-
sumx = np.zeros_like(out)
50-
51-
N, K = (<object>values).shape
52-
53-
with nogil:
54-
55-
for i in range(N):
56-
lab = labels[i]
57-
if lab < 0:
58-
continue
59-
60-
counts[lab] += 1
61-
for j in range(K):
62-
val = values[i, j]
63-
64-
# not nan
65-
if val == val:
66-
nobs[lab, j] += 1
67-
sumx[lab, j] += val
68-
69-
for i in range(ncounts):
70-
for j in range(K):
71-
if nobs[i, j] < min_count:
72-
out[i, j] = NAN
73-
else:
74-
out[i, j] = sumx[i, j]
75-
76-
7730
@cython.wraparound(False)
7831
@cython.boundscheck(False)
7932
def group_prod_{{name}}({{c_type}}[:, :] out,

pandas/_libs/tslibs/timedeltas.pyx

+9-3
Original file line numberDiff line numberDiff line change
@@ -1127,10 +1127,11 @@ class Timedelta(_Timedelta):
11271127
'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L',
11281128
'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U',
11291129
'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'}
1130-
days, seconds, microseconds,
1131-
milliseconds, minutes, hours, weeks : numeric, optional
1130+
**kwargs
1131+
Available kwargs: {days, seconds, microseconds,
1132+
milliseconds, minutes, hours, weeks}.
11321133
Values for construction in compat with datetime.timedelta.
1133-
np ints and floats will be coerced to python ints and floats.
1134+
Numpy ints and floats will be coerced to python ints and floats.
11341135
11351136
Notes
11361137
-----
@@ -1158,6 +1159,11 @@ class Timedelta(_Timedelta):
11581159
"[weeks, days, hours, minutes, seconds, "
11591160
"milliseconds, microseconds, nanoseconds]")
11601161

1162+
if unit in {'Y', 'y', 'M'}:
1163+
warnings.warn("M and Y units are deprecated and "
1164+
"will be removed in a future version.",
1165+
FutureWarning, stacklevel=1)
1166+
11611167
if isinstance(value, Timedelta):
11621168
value = value.value
11631169
elif is_string_object(value):

pandas/_libs/tslibs/timestamps.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -1187,12 +1187,12 @@ class Timestamp(_Timestamp):
11871187
value = tz_localize_to_utc(np.array([self.value], dtype='i8'), tz,
11881188
ambiguous=ambiguous,
11891189
nonexistent=nonexistent)[0]
1190-
return Timestamp(value, tz=tz)
1190+
return Timestamp(value, tz=tz, freq=self.freq)
11911191
else:
11921192
if tz is None:
11931193
# reset tz
11941194
value = tz_convert_single(self.value, UTC, self.tz)
1195-
return Timestamp(value, tz=None)
1195+
return Timestamp(value, tz=tz, freq=self.freq)
11961196
else:
11971197
raise TypeError('Cannot localize tz-aware Timestamp, use '
11981198
'tz_convert for conversions')
@@ -1222,7 +1222,7 @@ class Timestamp(_Timestamp):
12221222
'tz_localize to localize')
12231223
else:
12241224
# Same UTC timestamp, different time zone
1225-
return Timestamp(self.value, tz=tz)
1225+
return Timestamp(self.value, tz=tz, freq=self.freq)
12261226

12271227
astimezone = tz_convert
12281228

pandas/core/algorithms.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -566,7 +566,7 @@ def _factorize_array(values, na_sentinel=-1, size_hint=None,
566566
coerced to ndarrays before factorization.
567567
"""),
568568
order=dedent("""\
569-
order
569+
order : None
570570
.. deprecated:: 0.23.0
571571
572572
This parameter has no effect and is deprecated.

pandas/core/arrays/categorical.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -2167,8 +2167,7 @@ def _reverse_indexer(self):
21672167
r, counts = libalgos.groupsort_indexer(self.codes.astype('int64'),
21682168
categories.size)
21692169
counts = counts.cumsum()
2170-
result = [r[counts[indexer]:counts[indexer + 1]]
2171-
for indexer in range(len(counts) - 1)]
2170+
result = (r[start:end] for start, end in zip(counts, counts[1:]))
21722171
result = dict(zip(categories, result))
21732172
return result
21742173

pandas/core/arrays/datetimes.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ def _dt_array_cmp(cls, op):
128128
Wrap comparison operations to convert datetime-like to datetime64
129129
"""
130130
opname = '__{name}__'.format(name=op.__name__)
131-
nat_result = True if opname == '__ne__' else False
131+
nat_result = opname == '__ne__'
132132

133133
def wrapper(self, other):
134134
if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):

pandas/core/arrays/integer.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -561,7 +561,7 @@ def cmp_method(self, other):
561561
else:
562562
mask = self._mask | mask
563563

564-
result[mask] = True if op_name == 'ne' else False
564+
result[mask] = op_name == 'ne'
565565
return result
566566

567567
name = '__{name}__'.format(name=op.__name__)

pandas/core/arrays/period.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def _period_array_cmp(cls, op):
4646
Wrap comparison operations to convert Period-like to PeriodDtype
4747
"""
4848
opname = '__{name}__'.format(name=op.__name__)
49-
nat_result = True if opname == '__ne__' else False
49+
nat_result = opname == '__ne__'
5050

5151
def wrapper(self, other):
5252
op = getattr(self.asi8, opname)

pandas/core/arrays/timedeltas.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def _td_array_cmp(cls, op):
6262
Wrap comparison operations to convert timedelta-like to timedelta64
6363
"""
6464
opname = '__{name}__'.format(name=op.__name__)
65-
nat_result = True if opname == '__ne__' else False
65+
nat_result = opname == '__ne__'
6666

6767
def wrapper(self, other):
6868
if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):

pandas/core/computation/pytables.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ def evaluate(self):
252252
.format(slf=self))
253253

254254
rhs = self.conform(self.rhs)
255-
values = [TermValue(v, v, self.kind) for v in rhs]
255+
values = [TermValue(v, v, self.kind).value for v in rhs]
256256

257257
if self.is_in_table:
258258

@@ -263,7 +263,7 @@ def evaluate(self):
263263
self.filter = (
264264
self.lhs,
265265
filter_op,
266-
pd.Index([v.value for v in values]))
266+
pd.Index(values))
267267

268268
return self
269269
return None
@@ -275,7 +275,7 @@ def evaluate(self):
275275
self.filter = (
276276
self.lhs,
277277
filter_op,
278-
pd.Index([v.value for v in values]))
278+
pd.Index(values))
279279

280280
else:
281281
raise TypeError("passing a filterable condition to a non-table "

0 commit comments

Comments
 (0)