Skip to content

Commit edf6190

Browse files
committed
Merge remote-tracking branch 'upstream/main' into clean/environment_yml
2 parents 1be3b48 + c5a640d commit edf6190

File tree

85 files changed

+1215
-455
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+1215
-455
lines changed

.github/CODE_OF_CONDUCT.md

Lines changed: 0 additions & 62 deletions
This file was deleted.

.github/CONTRIBUTING.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

.github/FUNDING.yml

Lines changed: 0 additions & 3 deletions
This file was deleted.

.github/SECURITY.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

.github/actions/build_pandas/action.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,6 @@ runs:
1717
shell: bash -el {0}
1818
env:
1919
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20-
N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}
20+
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21+
N_JOBS: 1
22+
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}

.github/workflows/comment_bot.yml

Lines changed: 0 additions & 40 deletions
This file was deleted.

.github/workflows/macos-windows.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ on:
1515
env:
1616
PANDAS_CI: 1
1717
PYTEST_TARGET: pandas
18-
PYTEST_WORKERS: auto
1918
PATTERN: "not slow and not db and not network and not single_cpu"
2019

2120

@@ -36,6 +35,9 @@ jobs:
3635
# https://github.community/t/concurrecy-not-work-for-push/183068/7
3736
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.os }}
3837
cancel-in-progress: true
38+
env:
39+
# GH 47443: PYTEST_WORKERS > 1 crashes Windows builds with memory related errors
40+
PYTEST_WORKERS: ${{ matrix.os == 'macos-latest' && 'auto' || '1' }}
3941

4042
steps:
4143
- name: Checkout

doc/source/development/contributing.rst

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -326,13 +326,7 @@ Autofixing formatting errors
326326
----------------------------
327327

328328
We use several styling checks (e.g. ``black``, ``flake8``, ``isort``) which are run after
329-
you make a pull request. If there is a scenario where any of these checks fail then you
330-
can comment::
331-
332-
@github-actions pre-commit
333-
334-
on that pull request. This will trigger a workflow which will autofix formatting
335-
errors.
329+
you make a pull request.
336330

337331
To automatically fix formatting errors on each commit you make, you can
338332
set up pre-commit yourself. First, create a Python :ref:`environment

doc/source/reference/testing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Exceptions and warnings
3030
errors.DtypeWarning
3131
errors.DuplicateLabelError
3232
errors.EmptyDataError
33+
errors.IndexingError
3334
errors.InvalidIndexError
3435
errors.IntCastingNaNError
3536
errors.MergeError

doc/source/whatsnew/v1.4.3.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,20 @@ including other versions of pandas.
1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717
- Fixed regression in :meth:`DataFrame.replace` when the replacement value was explicitly ``None`` when passed in a dictionary to ``to_replace`` also casting other columns to object dtype even when there were no values to replace (:issue:`46634`)
18+
- Fixed regression in :meth:`DataFrame.to_csv` raising error when :class:`DataFrame` contains extension dtype categorical column (:issue:`46297`, :issue:`46812`)
19+
- Fixed regression in representation of ``dtypes`` attribute of :class:`MultiIndex` (:issue:`46900`)
1820
- Fixed regression when setting values with :meth:`DataFrame.loc` updating :class:`RangeIndex` when index was set as new column and column was updated afterwards (:issue:`47128`)
21+
- Fixed regression in :meth:`DataFrame.fillna` and :meth:`DataFrame.update` creating a copy when updating inplace (:issue:`47188`)
1922
- Fixed regression in :meth:`DataFrame.nsmallest` led to wrong results when ``np.nan`` in the sorting column (:issue:`46589`)
2023
- Fixed regression in :func:`read_fwf` raising ``ValueError`` when ``widths`` was specified with ``usecols`` (:issue:`46580`)
2124
- Fixed regression in :func:`concat` not sorting columns for mixed column names (:issue:`47127`)
2225
- Fixed regression in :meth:`.Groupby.transform` and :meth:`.Groupby.agg` failing with ``engine="numba"`` when the index was a :class:`MultiIndex` (:issue:`46867`)
26+
- Fixed regression in ``NaN`` comparison for :class:`Index` operations where the same object was compared (:issue:`47105`)
2327
- Fixed regression is :meth:`.Styler.to_latex` and :meth:`.Styler.to_html` where ``buf`` failed in combination with ``encoding`` (:issue:`47053`)
2428
- Fixed regression in :func:`read_csv` with ``index_col=False`` identifying first row as index names when ``header=None`` (:issue:`46955`)
2529
- Fixed regression in :meth:`.DataFrameGroupBy.agg` when used with list-likes or dict-likes and ``axis=1`` that would give incorrect results; now raises ``NotImplementedError`` (:issue:`46995`)
2630
- Fixed regression in :meth:`DataFrame.resample` and :meth:`DataFrame.rolling` when used with list-likes or dict-likes and ``axis=1`` that would raise an unintuitive error message; now raises ``NotImplementedError`` (:issue:`46904`)
31+
- Fixed regression in :func:`assert_index_equal` when ``check_order=False`` and :class:`Index` has extension or object dtype (:issue:`47207`)
2732
- Fixed regression in :func:`read_excel` returning ints as floats on certain input sheets (:issue:`46988`)
2833
- Fixed regression in :meth:`DataFrame.shift` when ``axis`` is ``columns`` and ``fill_value`` is absent, ``freq`` is ignored (:issue:`47039`)
2934

doc/source/whatsnew/v1.5.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,8 +174,9 @@ Other enhancements
174174
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`)
175175
- Added ``numeric_only`` argument to :meth:`Resampler.sum`, :meth:`Resampler.prod`, :meth:`Resampler.min`, :meth:`Resampler.max`, :meth:`Resampler.first`, and :meth:`Resampler.last` (:issue:`46442`)
176176
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
177-
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, :class:`SettingWithCopyWarning`, :class:`NumExprClobberingError`, :class:`UndefinedVariableError` are now exposed in ``pandas.errors`` (:issue:`27656`)
177+
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, :class:`SettingWithCopyWarning`, :class:`NumExprClobberingError`, :class:`UndefinedVariableError`, and :class:`IndexingError` are now exposed in ``pandas.errors`` (:issue:`27656`)
178178
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
179+
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
179180

180181
.. ---------------------------------------------------------------------------
181182
.. _whatsnew_150.notable_bug_fixes:
@@ -860,6 +861,7 @@ I/O
860861
- Bug in :func:`read_csv` not recognizing line break for ``on_bad_lines="warn"`` for ``engine="c"`` (:issue:`41710`)
861862
- Bug in :meth:`DataFrame.to_csv` not respecting ``float_format`` for ``Float64`` dtype (:issue:`45991`)
862863
- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`)
864+
- Bug in :func:`read_csv` interpreting second row as :class:`Index` names even when ``index_col=False`` (:issue:`46569`)
863865
- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)
864866
- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`)
865867
- Bug in :func:`read_html` where elements surrounding ``<br>`` were joined without a space between them (:issue:`29528`)
@@ -872,6 +874,7 @@ I/O
872874
- Bug in :func:`read_sas` returned ``None`` rather than an empty DataFrame for SAS7BDAT files with zero rows (:issue:`18198`)
873875
- Bug in :class:`StataWriter` where value labels were always written with default encoding (:issue:`46750`)
874876
- Bug in :class:`StataWriterUTF8` where some valid characters were removed from variable names (:issue:`47276`)
877+
- Bug in :meth:`DataFrame.to_excel` when writing an empty dataframe with :class:`MultiIndex` (:issue:`19543`)
875878

876879
Period
877880
^^^^^^

pandas/_libs/tslib.pyi

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ def format_array_from_datetime(
99
tz: tzinfo | None = ...,
1010
format: str | None = ...,
1111
na_rep: object = ...,
12+
reso: int = ..., # NPY_DATETIMEUNIT
1213
) -> npt.NDArray[np.object_]: ...
1314
def array_with_unit_to_datetime(
1415
values: np.ndarray,

pandas/_libs/tslib.pyx

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,12 @@ import pytz
2828

2929
from pandas._libs.tslibs.np_datetime cimport (
3030
NPY_DATETIMEUNIT,
31+
NPY_FR_ns,
3132
check_dts_bounds,
32-
dt64_to_dtstruct,
3333
dtstruct_to_dt64,
3434
get_datetime64_value,
3535
npy_datetimestruct,
36+
pandas_datetime_to_datetimestruct,
3637
pydate_to_dt64,
3738
pydatetime_to_dt64,
3839
string_to_dts,
@@ -104,10 +105,11 @@ def _test_parse_iso8601(ts: str):
104105
@cython.wraparound(False)
105106
@cython.boundscheck(False)
106107
def format_array_from_datetime(
107-
ndarray[int64_t] values,
108+
ndarray values,
108109
tzinfo tz=None,
109110
str format=None,
110-
object na_rep=None
111+
object na_rep=None,
112+
NPY_DATETIMEUNIT reso=NPY_FR_ns,
111113
) -> np.ndarray:
112114
"""
113115
return a np object array of the string formatted values
@@ -120,40 +122,49 @@ def format_array_from_datetime(
120122
a strftime capable string
121123
na_rep : optional, default is None
122124
a nat format
125+
reso : NPY_DATETIMEUNIT, default NPY_FR_ns
123126

124127
Returns
125128
-------
126129
np.ndarray[object]
127130
"""
128131
cdef:
129-
int64_t val, ns, N = len(values)
132+
int64_t val, ns, N = values.size
130133
bint show_ms = False, show_us = False, show_ns = False
131134
bint basic_format = False
132-
ndarray[object] result = cnp.PyArray_EMPTY(values.ndim, values.shape, cnp.NPY_OBJECT, 0)
133135
_Timestamp ts
134-
str res
136+
object res
135137
npy_datetimestruct dts
136138

139+
# Note that `result` (and thus `result_flat`) is C-order and
140+
# `it` iterates C-order as well, so the iteration matches
141+
# See discussion at
142+
# github.com/pandas-dev/pandas/pull/46886#discussion_r860261305
143+
ndarray result = cnp.PyArray_EMPTY(values.ndim, values.shape, cnp.NPY_OBJECT, 0)
144+
object[::1] res_flat = result.ravel() # should NOT be a copy
145+
cnp.flatiter it = cnp.PyArray_IterNew(values)
146+
137147
if na_rep is None:
138148
na_rep = 'NaT'
139149

140150
# if we don't have a format nor tz, then choose
141151
# a format based on precision
142152
basic_format = format is None and tz is None
143153
if basic_format:
144-
reso_obj = get_resolution(values)
154+
reso_obj = get_resolution(values, reso=reso)
145155
show_ns = reso_obj == Resolution.RESO_NS
146156
show_us = reso_obj == Resolution.RESO_US
147157
show_ms = reso_obj == Resolution.RESO_MS
148158

149159
for i in range(N):
150-
val = values[i]
160+
# Analogous to: utc_val = values[i]
161+
val = (<int64_t*>cnp.PyArray_ITER_DATA(it))[0]
151162

152163
if val == NPY_NAT:
153-
result[i] = na_rep
164+
res = na_rep
154165
elif basic_format:
155166

156-
dt64_to_dtstruct(val, &dts)
167+
pandas_datetime_to_datetimestruct(val, reso, &dts)
157168
res = (f'{dts.year}-{dts.month:02d}-{dts.day:02d} '
158169
f'{dts.hour:02d}:{dts.min:02d}:{dts.sec:02d}')
159170

@@ -165,22 +176,31 @@ def format_array_from_datetime(
165176
elif show_ms:
166177
res += f'.{dts.us // 1000:03d}'
167178

168-
result[i] = res
169179

170180
else:
171181

172-
ts = Timestamp(val, tz=tz)
182+
ts = Timestamp._from_value_and_reso(val, reso=reso, tz=tz)
173183
if format is None:
174-
result[i] = str(ts)
184+
res = str(ts)
175185
else:
176186

177187
# invalid format string
178188
# requires dates > 1900
179189
try:
180190
# Note: dispatches to pydatetime
181-
result[i] = ts.strftime(format)
191+
res = ts.strftime(format)
182192
except ValueError:
183-
result[i] = str(ts)
193+
res = str(ts)
194+
195+
# Note: we can index result directly instead of using PyArray_MultiIter_DATA
196+
# like we do for the other functions because result is known C-contiguous
197+
# and is the first argument to PyArray_MultiIterNew2. The usual pattern
198+
# does not seem to work with object dtype.
199+
# See discussion at
200+
# github.com/pandas-dev/pandas/pull/46886#discussion_r860261305
201+
res_flat[i] = res
202+
203+
cnp.PyArray_ITER_NEXT(it)
184204

185205
return result
186206

pandas/_libs/tslibs/ccalendar.pxd

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ cpdef int32_t get_day_of_year(int year, int month, int day) nogil
1515
cpdef int get_lastbday(int year, int month) nogil
1616
cpdef int get_firstbday(int year, int month) nogil
1717

18-
cdef int64_t DAY_NANOS
19-
cdef int64_t HOUR_NANOS
2018
cdef dict c_MONTH_NUMBERS
2119

2220
cdef int32_t* month_offset

pandas/_libs/tslibs/ccalendar.pyx

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,6 @@ DAYS_FULL = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
4747
int_to_weekday = {num: name for num, name in enumerate(DAYS)}
4848
weekday_to_int = {int_to_weekday[key]: key for key in int_to_weekday}
4949

50-
DAY_SECONDS = 86400
51-
HOUR_SECONDS = 3600
52-
53-
cdef const int64_t DAY_NANOS = DAY_SECONDS * 1_000_000_000
54-
cdef const int64_t HOUR_NANOS = HOUR_SECONDS * 1_000_000_000
5550

5651
# ----------------------------------------------------------------------
5752

0 commit comments

Comments
 (0)