Skip to content

Commit 4e72be1

Browse files
authored
Merge branch 'main' into issue-50977
2 parents b767d02 + 0105aa2 commit 4e72be1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+1015
-663
lines changed

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ repos:
135135
types: [python]
136136
stages: [manual]
137137
additional_dependencies: &pyright_dependencies
138-
138+
139139
- id: pyright_reportGeneralTypeIssues
140140
# note: assumes python env is setup and activated
141141
name: pyright reportGeneralTypeIssues

ci/code_checks.sh

-3
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
187187
pandas.show_versions \
188188
pandas.test \
189189
pandas.NaT \
190-
pandas.Timestamp.unit \
191190
pandas.Timestamp.as_unit \
192191
pandas.Timestamp.ctime \
193192
pandas.Timestamp.date \
@@ -579,13 +578,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
579578

580579
MSG='Partially validate docstrings (EX02)' ; echo $MSG
581580
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX02 --ignore_functions \
582-
pandas.DataFrame.copy \
583581
pandas.DataFrame.plot.line \
584582
pandas.DataFrame.std \
585583
pandas.DataFrame.var \
586584
pandas.Index.factorize \
587585
pandas.Period.strftime \
588-
pandas.Series.copy \
589586
pandas.Series.factorize \
590587
pandas.Series.floordiv \
591588
pandas.Series.plot.line \

doc/source/development/internals.rst

+23-23
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,24 @@ Indexing
1515
In pandas there are a few objects implemented which can serve as valid
1616
containers for the axis labels:
1717

18-
* ``Index``: the generic "ordered set" object, an ndarray of object dtype
18+
* :class:`Index`: the generic "ordered set" object, an ndarray of object dtype
1919
assuming nothing about its contents. The labels must be hashable (and
2020
likely immutable) and unique. Populates a dict of label to location in
2121
Cython to do ``O(1)`` lookups.
2222
* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
2323
data, such as time stamps
2424
* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
25-
* ``MultiIndex``: the standard hierarchical index object
26-
* ``DatetimeIndex``: An Index object with ``Timestamp`` boxed elements (impl are the int64 values)
27-
* ``TimedeltaIndex``: An Index object with ``Timedelta`` boxed elements (impl are the in64 values)
28-
* ``PeriodIndex``: An Index object with Period elements
25+
* :class:`MultiIndex`: the standard hierarchical index object
26+
* :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
27+
* :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)
28+
* :class:`PeriodIndex`: An Index object with Period elements
2929

3030
There are functions that make the creation of a regular index easy:
3131

32-
* ``date_range``: fixed frequency date range generated from a time rule or
32+
* :func:`date_range`: fixed frequency date range generated from a time rule or
3333
DateOffset. An ndarray of Python datetime objects
34-
* ``period_range``: fixed frequency date range generated from a time rule or
35-
DateOffset. An ndarray of ``Period`` objects, representing timespans
34+
* :func:`period_range`: fixed frequency date range generated from a time rule or
35+
DateOffset. An ndarray of :class:`Period` objects, representing timespans
3636

3737
The motivation for having an ``Index`` class in the first place was to enable
3838
different implementations of indexing. This means that it's possible for you,
@@ -43,28 +43,28 @@ From an internal implementation point of view, the relevant methods that an
4343
``Index`` must define are one or more of the following (depending on how
4444
incompatible the new object internals are with the ``Index`` functions):
4545

46-
* ``get_loc``: returns an "indexer" (an integer, or in some cases a
46+
* :meth:`~Index.get_loc`: returns an "indexer" (an integer, or in some cases a
4747
slice object) for a label
48-
* ``slice_locs``: returns the "range" to slice between two labels
49-
* ``get_indexer``: Computes the indexing vector for reindexing / data
48+
* :meth:`~Index.slice_locs`: returns the "range" to slice between two labels
49+
* :meth:`~Index.get_indexer`: Computes the indexing vector for reindexing / data
5050
alignment purposes. See the source / docstrings for more on this
51-
* ``get_indexer_non_unique``: Computes the indexing vector for reindexing / data
51+
* :meth:`~Index.get_indexer_non_unique`: Computes the indexing vector for reindexing / data
5252
alignment purposes when the index is non-unique. See the source / docstrings
5353
for more on this
54-
* ``reindex``: Does any pre-conversion of the input index then calls
54+
* :meth:`~Index.reindex`: Does any pre-conversion of the input index then calls
5555
``get_indexer``
56-
* ``union``, ``intersection``: computes the union or intersection of two
56+
* :meth:`~Index.union`, :meth:`~Index.intersection`: computes the union or intersection of two
5757
Index objects
58-
* ``insert``: Inserts a new label into an Index, yielding a new object
59-
* ``delete``: Delete a label, yielding a new object
60-
* ``drop``: Deletes a set of labels
61-
* ``take``: Analogous to ndarray.take
58+
* :meth:`~Index.insert`: Inserts a new label into an Index, yielding a new object
59+
* :meth:`~Index.delete`: Delete a label, yielding a new object
60+
* :meth:`~Index.drop`: Deletes a set of labels
61+
* :meth:`~Index.take`: Analogous to ndarray.take
6262

6363
MultiIndex
6464
~~~~~~~~~~
6565

66-
Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
67-
integer **codes** (until version 0.24 named *labels*), and the level **names**:
66+
Internally, the :class:`MultiIndex` consists of a few things: the **levels**, the
67+
integer **codes**, and the level **names**:
6868

6969
.. ipython:: python
7070
@@ -80,13 +80,13 @@ You can probably guess that the codes determine which unique element is
8080
identified with that location at each layer of the index. It's important to
8181
note that sortedness is determined **solely** from the integer codes and does
8282
not check (or care) whether the levels themselves are sorted. Fortunately, the
83-
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
84-
if you compute the levels and codes yourself, please be careful.
83+
constructors :meth:`~MultiIndex.from_tuples` and :meth:`~MultiIndex.from_arrays` ensure
84+
that this is true, but if you compute the levels and codes yourself, please be careful.
8585

8686
Values
8787
~~~~~~
8888

89-
pandas extends NumPy's type system with custom types, like ``Categorical`` or
89+
pandas extends NumPy's type system with custom types, like :class:`Categorical` or
9090
datetimes with a timezone, so we have multiple notions of "values". For 1-D
9191
containers (``Index`` classes and ``Series``) we have the following convention:
9292

doc/source/user_guide/indexing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ You can also assign a ``dict`` to a row of a ``DataFrame``:
231231
232232
You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
233233
if you try to use attribute access to create a new column, it creates a new attribute rather than a
234-
new column. In 0.21.0 and later, this will raise a ``UserWarning``:
234+
new column and will this raise a ``UserWarning``:
235235

236236
.. code-block:: ipython
237237

doc/source/whatsnew/v2.0.0.rst

+35
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,9 @@ to select the nullable dtypes implementation.
6969
* :func:`read_html`
7070
* :func:`read_xml`
7171
* :func:`read_json`
72+
* :func:`read_sql`
73+
* :func:`read_sql_query`
74+
* :func:`read_sql_table`
7275
* :func:`read_parquet`
7376
* :func:`read_orc`
7477
* :func:`read_feather`
@@ -330,6 +333,36 @@ a supported dtype:
330333
331334
pd.Series(["2016-01-01"], dtype="datetime64[D]")
332335
336+
.. _whatsnew_200.api_breaking.value_counts:
337+
338+
Value counts sets the resulting name to ``count``
339+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
340+
In past versions, when running :meth:`Series.value_counts`, the result would inherit
341+
the original object's name, and the result index would be nameless. This would cause
342+
confusion when resetting the index, and the column names would not correspond with the
343+
column values.
344+
Now, the result name will be ``'count'`` (or ``'proportion'`` if ``normalize=True`` was passed),
345+
and the index will be named after the original object (:issue:`49497`).
346+
347+
*Previous behavior*:
348+
349+
.. code-block:: ipython
350+
351+
In [8]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
352+
353+
Out[2]:
354+
quetzal 2
355+
elk 1
356+
Name: animal, dtype: int64
357+
358+
*New behavior*:
359+
360+
.. ipython:: python
361+
362+
pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
363+
364+
Likewise for other ``value_counts`` methods (for example, :meth:`DataFrame.value_counts`).
365+
333366
.. _whatsnew_200.api_breaking.astype_to_unsupported_datetimelike:
334367

335368
Disallow astype conversion to non-supported datetime64/timedelta64 dtypes
@@ -632,6 +665,7 @@ Other API changes
632665

633666
Deprecations
634667
~~~~~~~~~~~~
668+
- Deprecated parsing datetime strings with system-local timezone to ``tzlocal``, pass a ``tz`` keyword or explicitly call ``tz_localize`` instead (:issue:`50791`)
635669
- Deprecated argument ``infer_datetime_format`` in :func:`to_datetime` and :func:`read_csv`, as a strict version of it is now the default (:issue:`48621`)
636670
- Deprecated behavior of :func:`to_datetime` with ``unit`` when parsing strings, in a future version these will be parsed as datetimes (matching unit-less behavior) instead of cast to floats. To retain the old behavior, cast strings to numeric types before calling :func:`to_datetime` (:issue:`50735`)
637671
- Deprecated :func:`pandas.io.sql.execute` (:issue:`50185`)
@@ -959,6 +993,7 @@ Categorical
959993
- Bug in :meth:`Series.replace` with categorical dtype losing nullable dtypes of underlying categories (:issue:`49404`)
960994
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` would reorder categories when used as a grouper (:issue:`48749`)
961995
- Bug in :class:`Categorical` constructor when constructing from a :class:`Categorical` object and ``dtype="category"`` losing ordered-ness (:issue:`49309`)
996+
- Bug in :meth:`SeriesGroupBy.min`, :meth:`SeriesGroupBy.max`, :meth:`DataFrameGroupBy.min`, and :meth:`DataFrameGroupBy.max` with unordered :class:`CategoricalDtype` with no groups failing to raise ``TypeError`` (:issue:`51034`)
962997
-
963998

964999
Datetimelike

pandas/_libs/groupby.pyx

+24-29
Original file line numberDiff line numberDiff line change
@@ -289,10 +289,8 @@ def group_cumprod(
289289

290290
if uses_mask:
291291
isna_entry = mask[i, j]
292-
elif int64float_t is float64_t or int64float_t is float32_t:
293-
isna_entry = val != val
294292
else:
295-
isna_entry = False
293+
isna_entry = _treat_as_na(val, False)
296294

297295
if not isna_entry:
298296
isna_prev = accum_mask[lab, j]
@@ -737,23 +735,10 @@ def group_sum(
737735
for j in range(K):
738736
val = values[i, j]
739737

740-
# not nan
741-
# With dt64/td64 values, values have been cast to float64
742-
# instead if int64 for group_sum, but the logic
743-
# is otherwise the same as in _treat_as_na
744738
if uses_mask:
745739
isna_entry = mask[i, j]
746-
elif (
747-
sum_t is float32_t
748-
or sum_t is float64_t
749-
or sum_t is complex64_t
750-
):
751-
# avoid warnings because of equality comparison
752-
isna_entry = not val == val
753-
elif sum_t is int64_t and is_datetimelike and val == NPY_NAT:
754-
isna_entry = True
755740
else:
756-
isna_entry = False
741+
isna_entry = _treat_as_na(val, is_datetimelike)
757742

758743
if not isna_entry:
759744
nobs[lab, j] += 1
@@ -831,10 +816,8 @@ def group_prod(
831816

832817
if uses_mask:
833818
isna_entry = mask[i, j]
834-
elif int64float_t is float32_t or int64float_t is float64_t:
835-
isna_entry = not val == val
836819
else:
837-
isna_entry = False
820+
isna_entry = _treat_as_na(val, False)
838821

839822
if not isna_entry:
840823
nobs[lab, j] += 1
@@ -906,7 +889,7 @@ def group_var(
906889
if uses_mask:
907890
isna_entry = mask[i, j]
908891
else:
909-
isna_entry = not val == val
892+
isna_entry = _treat_as_na(val, False)
910893

911894
if not isna_entry:
912895
nobs[lab, j] += 1
@@ -1008,9 +991,12 @@ def group_mean(
1008991
if uses_mask:
1009992
isna_entry = mask[i, j]
1010993
elif is_datetimelike:
994+
# With group_mean, we cannot just use _treat_as_na bc
995+
# datetimelike dtypes get cast to float64 instead of
996+
# to int64.
1011997
isna_entry = val == NPY_NAT
1012998
else:
1013-
isna_entry = not val == val
999+
isna_entry = _treat_as_na(val, is_datetimelike)
10141000

10151001
if not isna_entry:
10161002
nobs[lab, j] += 1
@@ -1086,10 +1072,8 @@ def group_ohlc(
10861072

10871073
if uses_mask:
10881074
isna_entry = mask[i, 0]
1089-
elif int64float_t is float32_t or int64float_t is float64_t:
1090-
isna_entry = val != val
10911075
else:
1092-
isna_entry = False
1076+
isna_entry = _treat_as_na(val, False)
10931077

10941078
if isna_entry:
10951079
continue
@@ -1231,15 +1215,26 @@ def group_quantile(
12311215
# group_nth, group_last, group_rank
12321216
# ----------------------------------------------------------------------
12331217

1234-
cdef bint _treat_as_na(numeric_object_t val, bint is_datetimelike) nogil:
1235-
if numeric_object_t is object:
1218+
ctypedef fused numeric_object_complex_t:
1219+
numeric_object_t
1220+
complex64_t
1221+
complex128_t
1222+
1223+
1224+
cdef bint _treat_as_na(numeric_object_complex_t val, bint is_datetimelike) nogil:
1225+
if numeric_object_complex_t is object:
12361226
# Should never be used, but we need to avoid the `val != val` below
12371227
# or else cython will raise about gil acquisition.
12381228
raise NotImplementedError
12391229

1240-
elif numeric_object_t is int64_t:
1230+
elif numeric_object_complex_t is int64_t:
12411231
return is_datetimelike and val == NPY_NAT
1242-
elif numeric_object_t is float32_t or numeric_object_t is float64_t:
1232+
elif (
1233+
numeric_object_complex_t is float32_t
1234+
or numeric_object_complex_t is float64_t
1235+
or numeric_object_complex_t is complex64_t
1236+
or numeric_object_complex_t is complex128_t
1237+
):
12431238
return val != val
12441239
else:
12451240
# non-datetimelike integer

pandas/_libs/internals.pyi

+3-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,9 @@ class BlockPlacement:
4444
@property
4545
def is_slice_like(self) -> bool: ...
4646
@overload
47-
def __getitem__(self, loc: slice | Sequence[int]) -> BlockPlacement: ...
47+
def __getitem__(
48+
self, loc: slice | Sequence[int] | npt.NDArray[np.intp]
49+
) -> BlockPlacement: ...
4850
@overload
4951
def __getitem__(self, loc: int) -> int: ...
5052
def __iter__(self) -> Iterator[int]: ...

pandas/_libs/tslibs/conversion.pyx

+13-13
Original file line numberDiff line numberDiff line change
@@ -50,28 +50,26 @@ from pandas._libs.tslibs.np_datetime cimport (
5050

5151
from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
5252

53-
from pandas._libs.tslibs.timezones cimport (
54-
get_utcoffset,
55-
is_utc,
56-
)
57-
from pandas._libs.tslibs.util cimport (
58-
is_datetime64_object,
59-
is_float_object,
60-
is_integer_object,
61-
)
62-
63-
from pandas._libs.tslibs.parsing import parse_datetime_string
64-
6553
from pandas._libs.tslibs.nattype cimport (
6654
NPY_NAT,
6755
c_NaT as NaT,
6856
c_nat_strings as nat_strings,
6957
)
58+
from pandas._libs.tslibs.parsing cimport parse_datetime_string
7059
from pandas._libs.tslibs.timestamps cimport _Timestamp
60+
from pandas._libs.tslibs.timezones cimport (
61+
get_utcoffset,
62+
is_utc,
63+
)
7164
from pandas._libs.tslibs.tzconversion cimport (
7265
Localizer,
7366
tz_localize_to_utc_single,
7467
)
68+
from pandas._libs.tslibs.util cimport (
69+
is_datetime64_object,
70+
is_float_object,
71+
is_integer_object,
72+
)
7573

7674
# ----------------------------------------------------------------------
7775
# Constants
@@ -550,8 +548,10 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz, str unit,
550548
return obj
551549

552550
dt = parse_datetime_string(
553-
ts, dayfirst=dayfirst, yearfirst=yearfirst
551+
ts, dayfirst=dayfirst, yearfirst=yearfirst, out_bestunit=&out_bestunit
554552
)
553+
reso = get_supported_reso(out_bestunit)
554+
return convert_datetime_to_tsobject(dt, tz, nanos=0, reso=reso)
555555

556556
return convert_datetime_to_tsobject(dt, tz)
557557

pandas/_libs/tslibs/parsing.pxd

+11
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1+
from cpython.datetime cimport datetime
2+
3+
from pandas._libs.tslibs.np_datetime cimport NPY_DATETIMEUNIT
4+
15

26
cpdef str get_rule_month(str source)
37
cpdef quarter_to_myear(int year, int quarter, str freq)
8+
9+
cdef datetime parse_datetime_string(
10+
str date_string,
11+
bint dayfirst,
12+
bint yearfirst,
13+
NPY_DATETIMEUNIT* out_bestunit
14+
)

pandas/_libs/tslibs/parsing.pyi

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ from pandas._typing import npt
66

77
class DateParseError(ValueError): ...
88

9-
def parse_datetime_string(
9+
def py_parse_datetime_string(
1010
date_string: str,
1111
dayfirst: bool = ...,
1212
yearfirst: bool = ...,

0 commit comments

Comments
 (0)