Skip to content

Commit 0549e6d

Browse files
committed
Merge branch 'main' into DEPR-to_datetime-mixed-offsets-with-utc=False
2 parents 1220130 + d2f05c2 commit 0549e6d

File tree

14 files changed

+114
-63
lines changed

14 files changed

+114
-63
lines changed

doc/source/user_guide/window.rst

Lines changed: 21 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ a ``BaseIndexer`` subclass that allows a user to define a custom method for calc
244244
The ``BaseIndexer`` subclass will need to define a ``get_window_bounds`` method that returns
245245
a tuple of two arrays, the first being the starting indices of the windows and second being the
246246
ending indices of the windows. Additionally, ``num_values``, ``min_periods``, ``center``, ``closed``
247-
and will automatically be passed to ``get_window_bounds`` and the defined method must
247+
and ``step`` will automatically be passed to ``get_window_bounds`` and the defined method must
248248
always accept these arguments.
249249

250250
For example, if we have the following :class:`DataFrame`
@@ -259,33 +259,26 @@ For example, if we have the following :class:`DataFrame`
259259
and we want to use an expanding window where ``use_expanding`` is ``True`` otherwise a window of size
260260
1, we can create the following ``BaseIndexer`` subclass:
261261

262-
.. code-block:: ipython
263-
264-
In [2]: from pandas.api.indexers import BaseIndexer
265-
266-
In [3]: class CustomIndexer(BaseIndexer):
267-
...: def get_window_bounds(self, num_values, min_periods, center, closed):
268-
...: start = np.empty(num_values, dtype=np.int64)
269-
...: end = np.empty(num_values, dtype=np.int64)
270-
...: for i in range(num_values):
271-
...: if self.use_expanding[i]:
272-
...: start[i] = 0
273-
...: end[i] = i + 1
274-
...: else:
275-
...: start[i] = i
276-
...: end[i] = i + self.window_size
277-
...: return start, end
278-
279-
In [4]: indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)
280-
281-
In [5]: df.rolling(indexer).sum()
282-
Out[5]:
283-
values
284-
0 0.0
285-
1 1.0
286-
2 3.0
287-
3 3.0
288-
4 10.0
262+
.. ipython:: python
263+
264+
from pandas.api.indexers import BaseIndexer
265+
266+
class CustomIndexer(BaseIndexer):
267+
def get_window_bounds(self, num_values, min_periods, center, closed, step):
268+
start = np.empty(num_values, dtype=np.int64)
269+
end = np.empty(num_values, dtype=np.int64)
270+
for i in range(num_values):
271+
if self.use_expanding[i]:
272+
start[i] = 0
273+
end[i] = i + 1
274+
else:
275+
start[i] = i
276+
end[i] = i + self.window_size
277+
return start, end
278+
279+
indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)
280+
281+
df.rolling(indexer).sum()
289282
290283
You can view other examples of ``BaseIndexer`` subclasses `here <https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexers/objects.py>`__
291284

doc/source/whatsnew/v2.1.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ Other enhancements
149149
- Adding ``engine_kwargs`` parameter to :meth:`DataFrame.read_excel` (:issue:`52214`)
150150
- Classes that are useful for type-hinting have been added to the public API in the new submodule ``pandas.api.typing`` (:issue:`48577`)
151151
- Implemented :attr:`Series.dt.is_month_start`, :attr:`Series.dt.is_month_end`, :attr:`Series.dt.is_year_start`, :attr:`Series.dt.is_year_end`, :attr:`Series.dt.is_quarter_start`, :attr:`Series.dt.is_quarter_end`, :attr:`Series.dt.is_days_in_month`, :attr:`Series.dt.unit`, :meth:`Series.dt.is_normalize`, :meth:`Series.dt.day_name`, :meth:`Series.dt.month_name`, :meth:`Series.dt.tz_convert` for :class:`ArrowDtype` with ``pyarrow.timestamp`` (:issue:`52388`, :issue:`51718`)
152+
- Implemented :func:`api.interchange.from_dataframe` for :class:`DatetimeTZDtype` (:issue:`54239`)
152153
- Implemented ``__from_arrow__`` on :class:`DatetimeTZDtype`. (:issue:`52201`)
153154
- Implemented ``__pandas_priority__`` to allow custom types to take precedence over :class:`DataFrame`, :class:`Series`, :class:`Index`, or :class:`ExtensionArray` for arithmetic operations, :ref:`see the developer guide <extending.pandas_priority>` (:issue:`48347`)
154155
- Improve error message when having incompatible columns using :meth:`DataFrame.merge` (:issue:`51861`)
@@ -676,6 +677,7 @@ Other
676677
- Bug in :meth:`DataFrame.shift` with ``axis=1`` on a :class:`DataFrame` with a single :class:`ExtensionDtype` column giving incorrect results (:issue:`53832`)
677678
- Bug in :meth:`Index.sort_values` when a ``key`` is passed (:issue:`52764`)
678679
- Bug in :meth:`Series.align`, :meth:`DataFrame.align`, :meth:`Series.reindex`, :meth:`DataFrame.reindex`, :meth:`Series.interpolate`, :meth:`DataFrame.interpolate`, incorrectly failing to raise with method="asfreq" (:issue:`53620`)
680+
- Bug in :meth:`Series.argsort` failing to raise when an invalid ``axis`` is passed (:issue:`54257`)
679681
- Bug in :meth:`Series.map` when giving a callable to an empty series, the returned series had ``object`` dtype. It now keeps the original dtype (:issue:`52384`)
680682
- Bug in :meth:`Series.memory_usage` when ``deep=True`` throw an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (:issue:`51858`)
681683
- Bug in :meth:`period_range` the default behavior when freq was not passed as an argument was incorrect(:issue:`53687`)

pandas/core/arrays/base.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1678,6 +1678,19 @@ def _reduce(
16781678
# Non-Optimized Default Methods; in the case of the private methods here,
16791679
# these are not guaranteed to be stable across pandas versions.
16801680

1681+
def _values_for_json(self) -> np.ndarray:
1682+
"""
1683+
Specify how to render our entries in to_json.
1684+
1685+
Notes
1686+
-----
1687+
The dtype on the returned ndarray is not restricted, but for non-native
1688+
types that are not specifically handled in objToJSON.c, to_json is
1689+
liable to raise. In these cases, it may be safer to return an ndarray
1690+
of strings.
1691+
"""
1692+
return np.asarray(self)
1693+
16811694
def _hash_pandas_object(
16821695
self, *, encoding: str, hash_key: str, categorize: bool
16831696
) -> npt.NDArray[np.uint64]:

pandas/core/arrays/datetimelike.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2202,6 +2202,12 @@ def _with_freq(self, freq) -> Self:
22022202
# --------------------------------------------------------------
22032203
# ExtensionArray Interface
22042204

2205+
def _values_for_json(self) -> np.ndarray:
2206+
# Small performance bump vs the base class which calls np.asarray(self)
2207+
if isinstance(self.dtype, np.dtype):
2208+
return self._ndarray
2209+
return super()._values_for_json()
2210+
22052211
def factorize(
22062212
self,
22072213
use_na_sentinel: bool = True,

pandas/core/interchange/column.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,10 @@
99
from pandas.errors import NoBufferPresent
1010
from pandas.util._decorators import cache_readonly
1111

12-
from pandas.core.dtypes.dtypes import ArrowDtype
12+
from pandas.core.dtypes.dtypes import (
13+
ArrowDtype,
14+
DatetimeTZDtype,
15+
)
1316

1417
import pandas as pd
1518
from pandas.api.types import is_string_dtype
@@ -138,6 +141,8 @@ def _dtype_from_pandasdtype(self, dtype) -> tuple[DtypeKind, int, str, str]:
138141
raise ValueError(f"Data type {dtype} not supported by interchange protocol")
139142
if isinstance(dtype, ArrowDtype):
140143
byteorder = dtype.numpy_dtype.byteorder
144+
elif isinstance(dtype, DatetimeTZDtype):
145+
byteorder = dtype.base.byteorder # type: ignore[union-attr]
141146
else:
142147
byteorder = dtype.byteorder
143148

@@ -269,7 +274,13 @@ def _get_data_buffer(
269274
DtypeKind.BOOL,
270275
DtypeKind.DATETIME,
271276
):
272-
buffer = PandasBuffer(self._col.to_numpy(), allow_copy=self._allow_copy)
277+
# self.dtype[2] is an ArrowCTypes.TIMESTAMP where the tz will make
278+
# it longer than 4 characters
279+
if self.dtype[0] == DtypeKind.DATETIME and len(self.dtype[2]) > 4:
280+
np_arr = self._col.dt.tz_convert(None).to_numpy()
281+
else:
282+
np_arr = self._col.to_numpy()
283+
buffer = PandasBuffer(np_arr, allow_copy=self._allow_copy)
273284
dtype = self.dtype
274285
elif self.dtype[0] == DtypeKind.CATEGORICAL:
275286
codes = self._col.values._codes

pandas/core/interchange/from_dataframe.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -325,20 +325,20 @@ def string_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
325325
return np.asarray(str_list, dtype="object"), buffers
326326

327327

328-
def parse_datetime_format_str(format_str, data):
328+
def parse_datetime_format_str(format_str, data) -> pd.Series | np.ndarray:
329329
"""Parse datetime `format_str` to interpret the `data`."""
330330
# timestamp 'ts{unit}:tz'
331331
timestamp_meta = re.match(r"ts([smun]):(.*)", format_str)
332332
if timestamp_meta:
333333
unit, tz = timestamp_meta.group(1), timestamp_meta.group(2)
334-
if tz != "":
335-
raise NotImplementedError("Timezones are not supported yet")
336334
if unit != "s":
337335
# the format string describes only a first letter of the unit, so
338336
# add one extra letter to convert the unit to numpy-style:
339337
# 'm' -> 'ms', 'u' -> 'us', 'n' -> 'ns'
340338
unit += "s"
341339
data = data.astype(f"datetime64[{unit}]")
340+
if tz != "":
341+
data = pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(tz)
342342
return data
343343

344344
# date 'td{Days/Ms}'
@@ -358,7 +358,7 @@ def parse_datetime_format_str(format_str, data):
358358
raise NotImplementedError(f"DateTime kind is not supported: {format_str}")
359359

360360

361-
def datetime_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
361+
def datetime_column_to_ndarray(col: Column) -> tuple[np.ndarray | pd.Series, Any]:
362362
"""
363363
Convert a column holding DateTime data to a NumPy array.
364364
@@ -389,7 +389,7 @@ def datetime_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
389389
length=col.size(),
390390
)
391391

392-
data = parse_datetime_format_str(format_str, data)
392+
data = parse_datetime_format_str(format_str, data) # type: ignore[assignment]
393393
data = set_nulls(data, col, buffers["validity"])
394394
return data, buffers
395395

pandas/core/interchange/utils.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44

55
from __future__ import annotations
66

7-
import re
87
import typing
98

109
import numpy as np
@@ -14,6 +13,7 @@
1413
from pandas.core.dtypes.dtypes import (
1514
ArrowDtype,
1615
CategoricalDtype,
16+
DatetimeTZDtype,
1717
)
1818

1919
if typing.TYPE_CHECKING:
@@ -134,10 +134,13 @@ def dtype_to_arrow_c_fmt(dtype: DtypeObj) -> str:
134134

135135
if lib.is_np_dtype(dtype, "M"):
136136
# Selecting the first char of resolution string:
137-
# dtype.str -> '<M8[ns]'
138-
resolution = re.findall(r"\[(.*)\]", dtype.str)[0][:1]
137+
# dtype.str -> '<M8[ns]' -> 'n'
138+
resolution = np.datetime_data(dtype)[0][0]
139139
return ArrowCTypes.TIMESTAMP.format(resolution=resolution, tz="")
140140

141+
elif isinstance(dtype, DatetimeTZDtype):
142+
return ArrowCTypes.TIMESTAMP.format(resolution=dtype.unit[0], tz=dtype.tz)
143+
141144
raise NotImplementedError(
142145
f"Conversion of {dtype} to Arrow C format string is not implemented."
143146
)

pandas/core/internals/blocks.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1638,9 +1638,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
16381638
"""
16391639
raise AbstractMethodError(self)
16401640

1641-
def values_for_json(self) -> np.ndarray:
1642-
raise AbstractMethodError(self)
1643-
16441641

16451642
class EABackedBlock(Block):
16461643
"""
@@ -1885,9 +1882,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
18851882
# TODO(EA2D): reshape not needed with 2D EAs
18861883
return np.asarray(values).reshape(self.shape)
18871884

1888-
def values_for_json(self) -> np.ndarray:
1889-
return np.asarray(self.values)
1890-
18911885
@final
18921886
def pad_or_backfill(
18931887
self,
@@ -2174,9 +2168,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
21742168
return self.values.astype(_dtype_obj)
21752169
return self.values
21762170

2177-
def values_for_json(self) -> np.ndarray:
2178-
return self.values
2179-
21802171
@cache_readonly
21812172
def is_numeric(self) -> bool: # type: ignore[override]
21822173
dtype = self.values.dtype
@@ -2231,9 +2222,6 @@ class DatetimeLikeBlock(NDArrayBackedExtensionBlock):
22312222
is_numeric = False
22322223
values: DatetimeArray | TimedeltaArray
22332224

2234-
def values_for_json(self) -> np.ndarray:
2235-
return self.values._ndarray
2236-
22372225

22382226
class DatetimeTZBlock(DatetimeLikeBlock):
22392227
"""implement a datetime64 block with a tz attribute"""
@@ -2242,10 +2230,6 @@ class DatetimeTZBlock(DatetimeLikeBlock):
22422230

22432231
__slots__ = ()
22442232

2245-
# Don't use values_for_json from DatetimeLikeBlock since it is
2246-
# an invalid optimization here(drop the tz)
2247-
values_for_json = NDArrayBackedExtensionBlock.values_for_json
2248-
22492233

22502234
# -----------------------------------------------------------------
22512235
# Constructor Helpers

pandas/core/internals/managers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1008,7 +1008,7 @@ def column_arrays(self) -> list[np.ndarray]:
10081008

10091009
for blk in self.blocks:
10101010
mgr_locs = blk._mgr_locs
1011-
values = blk.values_for_json()
1011+
values = blk.array_values._values_for_json()
10121012
if values.ndim == 1:
10131013
# TODO(EA2D): special casing not needed with 2D EAs
10141014
result[mgr_locs[0]] = values

pandas/core/series.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3958,6 +3958,10 @@ def argsort(
39583958
2 0
39593959
dtype: int64
39603960
"""
3961+
if axis != -1:
3962+
# GH#54257 We allow -1 here so that np.argsort(series) works
3963+
self._get_axis_number(axis)
3964+
39613965
values = self._values
39623966
mask = isna(values)
39633967

pandas/tests/interchange/test_impl.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,3 +284,14 @@ def test_empty_pyarrow(data):
284284
arrow_df = pa_from_dataframe(expected)
285285
result = from_dataframe(arrow_df)
286286
tm.assert_frame_equal(result, expected)
287+
288+
289+
@pytest.mark.parametrize("tz", ["UTC", "US/Pacific"])
290+
@pytest.mark.parametrize("unit", ["s", "ms", "us", "ns"])
291+
def test_datetimetzdtype(tz, unit):
292+
# GH 54239
293+
tz_data = (
294+
pd.date_range("2018-01-01", periods=5, freq="D").tz_localize(tz).as_unit(unit)
295+
)
296+
df = pd.DataFrame({"ts_tz": tz_data})
297+
tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))

pandas/tests/io/parser/test_parse_dates.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2237,3 +2237,19 @@ def test_parse_dates_arrow_engine(all_parsers):
22372237
}
22382238
)
22392239
tm.assert_frame_equal(result, expected)
2240+
2241+
2242+
@xfail_pyarrow
2243+
def test_from_csv_with_mixed_offsets(all_parsers):
2244+
parser = all_parsers
2245+
data = "a\n2020-01-01T00:00:00+01:00\n2020-01-01T00:00:00+00:00"
2246+
result = parser.read_csv(StringIO(data), parse_dates=["a"])["a"]
2247+
expected = Series(
2248+
[
2249+
Timestamp("2020-01-01 00:00:00+01:00"),
2250+
Timestamp("2020-01-01 00:00:00+00:00"),
2251+
],
2252+
name="a",
2253+
index=[0, 1],
2254+
)
2255+
tm.assert_series_equal(result, expected)

pandas/tests/series/methods/test_argsort.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,20 @@
1010

1111

1212
class TestSeriesArgsort:
13+
def test_argsort_axis(self):
14+
# GH#54257
15+
ser = Series(range(3))
16+
17+
msg = "No axis named 2 for object type Series"
18+
with pytest.raises(ValueError, match=msg):
19+
ser.argsort(axis=2)
20+
1321
def test_argsort_numpy(self, datetime_series):
1422
ser = datetime_series
15-
func = np.argsort
16-
tm.assert_numpy_array_equal(
17-
func(ser).values, func(np.array(ser)), check_dtype=False
18-
)
23+
24+
res = np.argsort(ser).values
25+
expected = np.argsort(np.array(ser))
26+
tm.assert_numpy_array_equal(res, expected)
1927

2028
# with missing values
2129
ts = ser.copy()
@@ -25,10 +33,10 @@ def test_argsort_numpy(self, datetime_series):
2533
with tm.assert_produces_warning(
2634
FutureWarning, match=msg, check_stacklevel=False
2735
):
28-
result = func(ts)[1::2]
29-
expected = func(np.array(ts.dropna()))
36+
result = np.argsort(ts)[1::2]
37+
expected = np.argsort(np.array(ts.dropna()))
3038

31-
tm.assert_numpy_array_equal(result.values, expected, check_dtype=False)
39+
tm.assert_numpy_array_equal(result.values, expected)
3240

3341
def test_argsort(self, datetime_series):
3442
argsorted = datetime_series.argsort()

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# See https://github.com/scipy/scipy/pull/12940 for the AIX issue.
44
requires = [
55
"meson-python==0.13.1",
6-
"meson[ninja]==1.0.1",
6+
"meson==1.0.1",
77
"wheel",
88
"Cython>=0.29.33,<3", # Note: sync with setup.py, environment.yml and asv.conf.json
99
"oldest-supported-numpy>=2022.8.16",

0 commit comments

Comments
 (0)