Skip to content

Commit 467073a

Browse files
committed
Merge branch 'master' into reduction_dtypes_II
2 parents 1ed3e2d + 49d96b4 commit 467073a

File tree

90 files changed

+2674
-1926
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+2674
-1926
lines changed

ci/code_checks.sh

-18
Original file line numberDiff line numberDiff line change
@@ -105,28 +105,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
105105
pandas.errors.UnsupportedFunctionCall \
106106
pandas.test \
107107
pandas.NaT \
108-
pandas.SparseDtype \
109-
pandas.DatetimeTZDtype.unit \
110-
pandas.DatetimeTZDtype.tz \
111-
pandas.PeriodDtype.freq \
112-
pandas.IntervalDtype.subtype \
113-
pandas_dtype \
114-
pandas.api.types.is_bool \
115-
pandas.api.types.is_complex \
116-
pandas.api.types.is_float \
117-
pandas.api.types.is_integer \
118-
pandas.api.types.pandas_dtype \
119108
pandas.read_clipboard \
120109
pandas.ExcelFile \
121110
pandas.ExcelFile.parse \
122-
pandas.DataFrame.to_html \
123111
pandas.io.formats.style.Styler.to_html \
124-
pandas.HDFStore.put \
125-
pandas.HDFStore.append \
126-
pandas.HDFStore.get \
127-
pandas.HDFStore.select \
128-
pandas.HDFStore.info \
129-
pandas.HDFStore.keys \
130112
pandas.HDFStore.groups \
131113
pandas.HDFStore.walk \
132114
pandas.read_feather \

ci/deps/actions-310.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ dependencies:
1515
- pytest-cov
1616
- pytest-xdist>=2.2.0
1717
- pytest-asyncio>=0.17.0
18+
- pytest-localserver>=0.7.1
1819
- boto3
1920

2021
# required dependencies

ci/deps/actions-311-downstream_compat.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ dependencies:
1616
- pytest-cov
1717
- pytest-xdist>=2.2.0
1818
- pytest-asyncio>=0.17.0
19+
- pytest-localserver>=0.7.1
1920
- boto3
2021

2122
# required dependencies

ci/deps/actions-311.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ dependencies:
1515
- pytest-cov
1616
- pytest-xdist>=2.2.0
1717
- pytest-asyncio>=0.17.0
18+
- pytest-localserver>=0.7.1
1819
- boto3
1920

2021
# required dependencies

ci/deps/actions-39-minimum_versions.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- pytest-cov
1818
- pytest-xdist>=2.2.0
1919
- pytest-asyncio>=0.17.0
20+
- pytest-localserver>=0.7.1
2021
- boto3
2122

2223
# required dependencies

ci/deps/actions-39.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ dependencies:
1515
- pytest-cov
1616
- pytest-xdist>=2.2.0
1717
- pytest-asyncio>=0.17.0
18+
- pytest-localserver>=0.7.1
1819
- boto3
1920

2021
# required dependencies

ci/deps/circle-310-arm64.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ dependencies:
1515
- pytest-cov
1616
- pytest-xdist>=2.2.0
1717
- pytest-asyncio>=0.17.0
18+
- pytest-localserver>=0.7.1
1819
- boto3
1920

2021
# required dependencies

doc/source/development/contributing_codebase.rst

+8-14
Original file line numberDiff line numberDiff line change
@@ -612,23 +612,17 @@ deleted when the context block is exited.
612612
Testing involving network connectivity
613613
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
614614

615-
It is highly discouraged to add a test that connects to the internet due to flakiness of network connections and
616-
lack of ownership of the server that is being connected to. If network connectivity is absolutely required, use the
617-
``tm.network`` decorator.
615+
A unit test should not access a public data set over the internet due to flakiness of network connections and
616+
lack of ownership of the server that is being connected to. To mock this interaction, use the ``httpserver`` fixture from the
617+
`pytest-localserver plugin. <https://github.com/pytest-dev/pytest-localserver>`_ with synthetic data.
618618

619619
.. code-block:: python
620620
621-
@tm.network # noqa
622-
def test_network():
623-
result = package.call_to_internet()
624-
625-
If the test requires data from a specific website, specify ``check_before_test=True`` and the site in the decorator.
626-
627-
.. code-block:: python
628-
629-
@tm.network("https://www.somespecificsite.com", check_before_test=True)
630-
def test_network():
631-
result = pd.read_html("https://www.somespecificsite.com")
621+
@pytest.mark.network
622+
@pytest.mark.single_cpu
623+
def test_network(httpserver):
624+
httpserver.serve_content(content="content")
625+
result = pd.read_html(httpserver.url)
632626
633627
Example
634628
^^^^^^^

doc/source/whatsnew/v2.0.3.rst

+2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ including other versions of pandas.
1313

1414
Fixed regressions
1515
~~~~~~~~~~~~~~~~~
16+
- Bug in :meth:`Timestamp.weekday`` was returning incorrect results before ``'0000-02-29'`` (:issue:`53738`)
1617
- Fixed performance regression in merging on datetime-like columns (:issue:`53231`)
18+
- Fixed regression when :meth:`DataFrame.to_string` creates extra space for string dtypes (:issue:`52690`)
1719
- For external ExtensionArray implementations, restored the default use of ``_values_for_factorize`` for hashing arrays (:issue:`53475`)
1820
-
1921

doc/source/whatsnew/v2.1.0.rst

+8-1
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,7 @@ Deprecations
333333
- Deprecated allowing ``downcast`` keyword other than ``None``, ``False``, "infer", or a dict with these as values in :meth:`Series.fillna`, :meth:`DataFrame.fillna` (:issue:`40988`)
334334
- Deprecated allowing arbitrary ``fill_value`` in :class:`SparseDtype`, in a future version the ``fill_value`` will need to be compatible with the ``dtype.subtype``, either a scalar that can be held by that subtype or ``NaN`` for integer or bool subtypes (:issue:`23124`)
335335
- Deprecated behavior of :func:`assert_series_equal` and :func:`assert_frame_equal` considering NA-like values (e.g. ``NaN`` vs ``None`` as equivalent) (:issue:`52081`)
336+
- Deprecated bytes input to :func:`read_excel`. To read a file path, use a string or path-like object. (:issue:`53767`)
336337
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
337338
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
338339
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
@@ -343,6 +344,7 @@ Deprecations
343344
- Deprecated the "method" and "limit" keywords on :meth:`Series.fillna`, :meth:`DataFrame.fillna`, :meth:`SeriesGroupBy.fillna`, :meth:`DataFrameGroupBy.fillna`, and :meth:`Resampler.fillna`, use ``obj.bfill()`` or ``obj.ffill()`` instead (:issue:`53394`)
344345
- Deprecated the ``method`` and ``limit`` keywords in :meth:`DataFrame.replace` and :meth:`Series.replace` (:issue:`33302`)
345346
- Deprecated values "pad", "ffill", "bfill", "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate`, use ``obj.ffill()`` or ``obj.bfill()`` instead (:issue:`53581`)
347+
-
346348

347349
.. ---------------------------------------------------------------------------
348350
.. _whatsnew_210.performance:
@@ -402,12 +404,14 @@ Datetimelike
402404
- :meth:`DatetimeIndex.map` with ``na_action="ignore"`` now works as expected. (:issue:`51644`)
403405
- Bug in :class:`DateOffset` which had inconsistent behavior when multiplying a :class:`DateOffset` object by a constant (:issue:`47953`)
404406
- Bug in :func:`date_range` when ``freq`` was a :class:`DateOffset` with ``nanoseconds`` (:issue:`46877`)
407+
- Bug in :meth:`DataFrame.to_sql` raising ``ValueError`` for pyarrow-backed date like dtypes (:issue:`53854`)
405408
- Bug in :meth:`Timestamp.date`, :meth:`Timestamp.isocalendar`, :meth:`Timestamp.timetuple`, and :meth:`Timestamp.toordinal` were returning incorrect results for inputs outside those supported by the Python standard library's datetime module (:issue:`53668`)
406409
- Bug in :meth:`Timestamp.round` with values close to the implementation bounds returning incorrect results instead of raising ``OutOfBoundsDatetime`` (:issue:`51494`)
407410
- Bug in :meth:`arrays.DatetimeArray.map` and :meth:`DatetimeIndex.map`, where the supplied callable operated array-wise instead of element-wise (:issue:`51977`)
408411
- Bug in constructing a :class:`Series` or :class:`DataFrame` from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (:issue:`52212`)
409412
- Bug in parsing datetime strings with weekday but no day e.g. "2023 Sept Thu" incorrectly raising ``AttributeError`` instead of ``ValueError`` (:issue:`52659`)
410413

414+
411415
Timedelta
412416
^^^^^^^^^
413417
- :meth:`TimedeltaIndex.map` with ``na_action="ignore"`` now works as expected (:issue:`51644`)
@@ -531,6 +535,7 @@ Reshaping
531535
- Bug in :func:`crosstab` when ``dropna=False`` would not keep ``np.nan`` in the result (:issue:`10772`)
532536
- Bug in :func:`merge_asof` raising ``KeyError`` for extension dtypes (:issue:`52904`)
533537
- Bug in :func:`merge_asof` raising ``ValueError`` for data backed by read-only ndarrays (:issue:`53513`)
538+
- Bug in :func:`merge_asof` with ``left_index=True`` or ``right_index=True`` with mismatched index dtypes giving incorrect results in some cases instead of raising ``MergeError`` (:issue:`53870`)
534539
- Bug in :meth:`DataFrame.agg` and :meth:`Series.agg` on non-unique columns would return incorrect type when dist-like argument passed in (:issue:`51099`)
535540
- Bug in :meth:`DataFrame.combine_first` ignoring other's columns if ``other`` is empty (:issue:`53792`)
536541
- Bug in :meth:`DataFrame.idxmin` and :meth:`DataFrame.idxmax`, where the axis dtype would be lost for empty frames (:issue:`53265`)
@@ -539,6 +544,7 @@ Reshaping
539544
- Bug in :meth:`DataFrame.stack` sorting columns lexicographically (:issue:`53786`)
540545
- Bug in :meth:`DataFrame.transpose` inferring dtype for object column (:issue:`51546`)
541546
- Bug in :meth:`Series.combine_first` converting ``int64`` dtype to ``float64`` and losing precision on very large integers (:issue:`51764`)
547+
-
542548

543549
Sparse
544550
^^^^^^
@@ -573,11 +579,12 @@ Other
573579
- Bug in :func:`assert_almost_equal` now throwing assertion error for two unequal sets (:issue:`51727`)
574580
- Bug in :func:`assert_frame_equal` checks category dtypes even when asked not to check index type (:issue:`52126`)
575581
- Bug in :meth:`DataFrame.reindex` with a ``fill_value`` that should be inferred with a :class:`ExtensionDtype` incorrectly inferring ``object`` dtype (:issue:`52586`)
582+
- Bug in :meth:`DataFrame.shift` and :meth:`Series.shift` when passing both "freq" and "fill_value" silently ignoring "fill_value" instead of raising ``ValueError`` (:issue:`53832`)
583+
- Bug in :meth:`DataFrame.shift` with ``axis=1`` on a :class:`DataFrame` with a single :class:`ExtensionDtype` column giving incorrect results (:issue:`53832`)
576584
- Bug in :meth:`Series.align`, :meth:`DataFrame.align`, :meth:`Series.reindex`, :meth:`DataFrame.reindex`, :meth:`Series.interpolate`, :meth:`DataFrame.interpolate`, incorrectly failing to raise with method="asfreq" (:issue:`53620`)
577585
- Bug in :meth:`Series.map` when giving a callable to an empty series, the returned series had ``object`` dtype. It now keeps the original dtype (:issue:`52384`)
578586
- Bug in :meth:`Series.memory_usage` when ``deep=True`` throw an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (:issue:`51858`)
579587
- Fixed incorrect ``__name__`` attribute of ``pandas._libs.json`` (:issue:`52898`)
580-
-
581588

582589
.. ***DO NOT USE THIS SECTION***
583590

environment.yml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- pytest-cov
1818
- pytest-xdist>=2.2.0
1919
- pytest-asyncio>=0.17.0
20+
- pytest-localserver>=0.7.1
2021
- coverage
2122

2223
# required dependencies

pandas/_libs/lib.pyx

+32
Original file line numberDiff line numberDiff line change
@@ -1056,6 +1056,14 @@ def is_float(obj: object) -> bool:
10561056
Returns
10571057
-------
10581058
bool
1059+
1060+
Examples
1061+
--------
1062+
>>> pd.api.types.is_float(1.0)
1063+
True
1064+
1065+
>>> pd.api.types.is_float(1)
1066+
False
10591067
"""
10601068
return util.is_float_object(obj)
10611069

@@ -1067,6 +1075,14 @@ def is_integer(obj: object) -> bool:
10671075
Returns
10681076
-------
10691077
bool
1078+
1079+
Examples
1080+
--------
1081+
>>> pd.api.types.is_integer(1)
1082+
True
1083+
1084+
>>> pd.api.types.is_integer(1.0)
1085+
False
10701086
"""
10711087
return util.is_integer_object(obj)
10721088

@@ -1089,6 +1105,14 @@ def is_bool(obj: object) -> bool:
10891105
Returns
10901106
-------
10911107
bool
1108+
1109+
Examples
1110+
--------
1111+
>>> pd.api.types.is_bool(True)
1112+
True
1113+
1114+
>>> pd.api.types.is_bool(1)
1115+
False
10921116
"""
10931117
return util.is_bool_object(obj)
10941118

@@ -1100,6 +1124,14 @@ def is_complex(obj: object) -> bool:
11001124
Returns
11011125
-------
11021126
bool
1127+
1128+
Examples
1129+
--------
1130+
>>> pd.api.types.is_complex(1 + 1j)
1131+
True
1132+
1133+
>>> pd.api.types.is_complex(1)
1134+
False
11031135
"""
11041136
return util.is_complex_object(obj)
11051137

pandas/_libs/tslibs/ccalendar.pyx

+33-12
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
"""
33
Cython implementations of functions resembling the stdlib calendar module
44
"""
5-
65
cimport cython
76
from numpy cimport (
87
int32_t,
@@ -19,7 +18,7 @@ cdef int32_t* days_per_month_array = [
1918
31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31,
2019
31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
2120

22-
cdef int* sakamoto_arr = [0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4]
21+
cdef int* em = [0, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334]
2322

2423
# The first 13 entries give the month days elapsed as of the first of month N
2524
# (or the total number of days in the year for N=13) in non-leap years.
@@ -76,11 +75,22 @@ cpdef int32_t get_days_in_month(int year, Py_ssize_t month) noexcept nogil:
7675

7776
@cython.wraparound(False)
7877
@cython.boundscheck(False)
79-
@cython.cdivision
78+
@cython.cdivision(True)
79+
cdef long quot(long a , long b) noexcept nogil:
80+
cdef long x
81+
x = a/b
82+
if (a < 0):
83+
x -= (a % b != 0)
84+
return x
85+
86+
87+
@cython.wraparound(False)
88+
@cython.boundscheck(False)
89+
@cython.cdivision(True)
8090
cdef int dayofweek(int y, int m, int d) noexcept nogil:
8191
"""
8292
Find the day of week for the date described by the Y/M/D triple y, m, d
83-
using Sakamoto's method, from wikipedia.
93+
using Gauss' method, from wikipedia.
8494
8595
0 represents Monday. See [1]_.
8696
@@ -103,16 +113,27 @@ cdef int dayofweek(int y, int m, int d) noexcept nogil:
103113
[1] https://docs.python.org/3/library/calendar.html#calendar.weekday
104114
105115
[2] https://en.wikipedia.org/wiki/\
106-
Determination_of_the_day_of_the_week#Sakamoto.27s_methods
116+
Determination_of_the_day_of_the_week#Gauss's_algorithm
107117
"""
118+
# Note: this particular implementation comes from
119+
# http://berndt-schwerdtfeger.de/wp-content/uploads/pdf/cal.pdf
108120
cdef:
109-
int day
110-
111-
y -= m < 3
112-
day = (y + y / 4 - y / 100 + y / 400 + sakamoto_arr[m - 1] + d) % 7
113-
# convert to python day
114-
return (day + 6) % 7
115-
121+
long c
122+
int g
123+
int f
124+
int e
125+
126+
if (m < 3):
127+
y -= 1
128+
129+
c = quot(y, 100)
130+
g = y - c * 100
131+
f = 5 * (c - quot(c, 4) * 4)
132+
e = em[m]
133+
134+
if (m > 2):
135+
e -= 1
136+
return (-1 + d + e + f + g + g/4) % 7
116137

117138
cdef bint is_leapyear(int64_t year) noexcept nogil:
118139
"""

pandas/_libs/tslibs/period.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -2674,7 +2674,7 @@ class Period(_Period):
26742674
freq : str, default None
26752675
One of pandas period strings or corresponding objects. Accepted
26762676
strings are listed in the
2677-
:ref:`offset alias section <timeseries.offset_aliases>` in the user docs.
2677+
:ref:`period alias section <timeseries.period_aliases>` in the user docs.
26782678
If value is datetime, freq is required.
26792679
ordinal : int, default None
26802680
The period offset from the proleptic Gregorian epoch.

pandas/_testing/__init__.py

-2
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@
5151
)
5252
from pandas._testing._io import (
5353
close,
54-
network,
5554
round_trip_localpath,
5655
round_trip_pathlib,
5756
round_trip_pickle,
@@ -1150,7 +1149,6 @@ def shares_memory(left, right) -> bool:
11501149
"makeUIntIndex",
11511150
"maybe_produces_warning",
11521151
"NARROW_NP_DTYPES",
1153-
"network",
11541152
"NP_NAT_OBJECTS",
11551153
"NULL_OBJECTS",
11561154
"OBJECT_DTYPES",

0 commit comments

Comments
 (0)