Skip to content

Commit 5f32d2f

Browse files
committed
Merge remote-tracking branch 'upstream/master' into typ_c_parser
2 parents 129d5af + adfc78b commit 5f32d2f

File tree

93 files changed

+781
-717
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+781
-717
lines changed

.github/ISSUE_TEMPLATE/bug_report.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Pandas version checks
1011
options:
1112
- label: >
1213
I have checked that this issue has not already been reported.

.github/ISSUE_TEMPLATE/documentation_improvement.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ labels: [Docs, Needs Triage]
66
body:
77
- type: checkboxes
88
attributes:
9+
label: Pandas version checks
910
options:
1011
- label: >
1112
I have checked that the issue still exists on the latest versions of the docs

.github/ISSUE_TEMPLATE/installation_issue.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Installation check
1011
options:
1112
- label: >
1213
I have read the [installation guide](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#installing-pandas).

.github/ISSUE_TEMPLATE/performance_issue.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Pandas version checks
1011
options:
1112
- label: >
1213
I have checked that this issue has not already been reported.

.github/ISSUE_TEMPLATE/submit_question.yml

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ body:
1111
usage questions, we ask that all usage questions are first asked on StackOverflow.
1212
- type: checkboxes
1313
attributes:
14+
label: Research
1415
options:
1516
- label: >
1617
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)

asv_bench/benchmarks/arithmetic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def setup(self, op, shape):
144144
# should already be the case, but just to be sure
145145
df._consolidate_inplace()
146146

147-
# TODO: GH#33198 the setting here shoudlnt need two steps
147+
# TODO: GH#33198 the setting here shouldn't need two steps
148148
arr1 = np.random.randn(n_rows, max(n_cols // 4, 3)).astype("f8")
149149
arr2 = np.random.randn(n_rows, n_cols // 2).astype("i8")
150150
arr3 = np.random.randn(n_rows, n_cols // 4).astype("f8")

asv_bench/benchmarks/io/csv.py

+12
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,18 @@ def time_frame_date_formatting(self):
6767
self.data.to_csv(self.fname, date_format="%Y%m%d")
6868

6969

70+
class ToCSVDatetimeIndex(BaseIO):
71+
72+
fname = "__test__.csv"
73+
74+
def setup(self):
75+
rng = date_range("2000", periods=100_000, freq="S")
76+
self.data = DataFrame({"a": 1}, index=rng)
77+
78+
def time_frame_date_formatting_index(self):
79+
self.data.to_csv(self.fname, date_format="%Y-%m-%d %H:%M:%S")
80+
81+
7082
class ToCSVDatetimeBig(BaseIO):
7183

7284
fname = "__test__.csv"

ci/deps/actions-38-db.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ dependencies:
1212
- pytest-cov>=2.10.1 # this is only needed in the coverage build, ref: GH 35737
1313

1414
# pandas dependencies
15-
- aiobotocore<2.0.0
15+
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
1616
- beautifulsoup4
1717
- boto3
1818
- botocore>=1.11

ci/run_tests.sh

+3
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@
55
# https://github.com/pytest-dev/pytest/issues/1075
66
export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
77

8+
# May help reproduce flaky CI builds if set in subsequent runs
9+
echo PYTHONHASHSEED=$PYTHONHASHSEED
10+
811
if [[ "not network" == *"$PATTERN"* ]]; then
912
export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4;
1013
fi

doc/source/whatsnew/v1.4.0.rst

+7-1
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,9 @@ Other Deprecations
537537
- Deprecated passing arguments as positional for :func:`read_fwf` other than ``filepath_or_buffer`` (:issue:`41485`):
538538
- Deprecated passing ``skipna=None`` for :meth:`DataFrame.mad` and :meth:`Series.mad`, pass ``skipna=True`` instead (:issue:`44580`)
539539
- Deprecated :meth:`DateOffset.apply`, use ``offset + other`` instead (:issue:`44522`)
540+
- Deprecated parameter ``names`` in :meth:`Index.copy` (:issue:`44916`)
540541
- A deprecation warning is now shown for :meth:`DataFrame.to_latex` indicating the arguments signature may change and emulate more the arguments to :meth:`.Styler.to_latex` in future versions (:issue:`44411`)
542+
- Deprecated :meth:`Categorical.replace`, use :meth:`Series.replace` instead (:issue:`44929`)
541543
-
542544

543545
.. ---------------------------------------------------------------------------
@@ -589,6 +591,7 @@ Performance improvements
589591
- Performance improvement in :meth:`Series.to_frame` (:issue:`43558`)
590592
- Performance improvement in :meth:`Series.mad` (:issue:`43010`)
591593
- Performance improvement in :func:`merge` (:issue:`43332`)
594+
- Performance improvement in :func:`to_csv` when index column is a datetime and is formatted (:issue:`39413`)
592595
- Performance improvement in :func:`read_csv` when ``index_col`` was set with a numeric column (:issue:`44158`)
593596
- Performance improvement in :func:`concat` (:issue:`43354`)
594597
-
@@ -660,7 +663,7 @@ Conversion
660663

661664
Strings
662665
^^^^^^^
663-
- Fixed bug in checking for ``string[pyarrow]`` dtype incorrectly raising an ImportError when pyarrow is not installed (:issue:`44327`)
666+
- Fixed bug in checking for ``string[pyarrow]`` dtype incorrectly raising an ImportError when pyarrow is not installed (:issue:`44276`)
664667
-
665668

666669
Interval
@@ -749,6 +752,7 @@ I/O
749752
- Bug in :func:`read_csv` raising ``ValueError`` when names was longer than header but equal to data rows for ``engine="python"`` (:issue:`38453`)
750753
- Bug in :class:`ExcelWriter`, where ``engine_kwargs`` were not passed through to all engines (:issue:`43442`)
751754
- Bug in :func:`read_csv` raising ``ValueError`` when ``parse_dates`` was used with ``MultiIndex`` columns (:issue:`8991`)
755+
- Bug in :func:`read_csv` not raising an ``ValueError`` when ``\n`` was specified as ``delimiter`` or ``sep`` which conflicts with ``lineterminator`` (:issue:`43528`)
752756
- Bug in :func:`read_csv` converting columns to numeric after date parsing failed (:issue:`11019`)
753757
- Bug in :func:`read_csv` not replacing ``NaN`` values with ``np.nan`` before attempting date conversion (:issue:`26203`)
754758
- Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`)
@@ -806,6 +810,7 @@ Reshaping
806810
- Bug in :func:`crosstab` would fail when inputs are lists or tuples (:issue:`44076`)
807811
- Bug in :meth:`DataFrame.append` failing to retain ``index.name`` when appending a list of :class:`Series` objects (:issue:`44109`)
808812
- Fixed metadata propagation in :meth:`Dataframe.apply` method, consequently fixing the same issue for :meth:`Dataframe.transform`, :meth:`Dataframe.nunique` and :meth:`Dataframe.mode` (:issue:`28283`)
813+
- Bug in :func:`concat` casting levels of :class:`MultiIndex` to float if the only consist of missing values (:issue:`44900`)
809814
- Bug in :meth:`DataFrame.stack` with ``ExtensionDtype`` columns incorrectly raising (:issue:`43561`)
810815
- Bug in :meth:`Series.unstack` with object doing unwanted type inference on resulting columns (:issue:`44595`)
811816
- Bug in :class:`MultiIndex` failing join operations with overlapping ``IntervalIndex`` levels (:issue:`44096`)
@@ -856,6 +861,7 @@ Other
856861
- Bug in :meth:`DataFrame.shift` with ``axis=1`` and ``ExtensionDtype`` columns incorrectly raising when an incompatible ``fill_value`` is passed (:issue:`44564`)
857862
- Bug in :meth:`DataFrame.diff` when passing a NumPy integer object instead of an ``int`` object (:issue:`44572`)
858863
- Bug in :meth:`Series.replace` raising ``ValueError`` when using ``regex=True`` with a :class:`Series` containing ``np.nan`` values (:issue:`43344`)
864+
- Bug in :meth:`DataFrame.to_records` where an incorrect ``n`` was used when missing names were replaced by ``level_n`` (:issue:`44818`)
859865

860866
.. ***DO NOT USE THIS SECTION***
861867

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ dependencies:
105105

106106
- pytables>=3.6.1 # pandas.read_hdf, DataFrame.to_hdf
107107
- s3fs>=0.4.0 # file IO when using 's3://...' path
108-
- aiobotocore<2.0.0
108+
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
109109
- fsspec>=0.7.4 # for generic remote file operations
110110
- gcsfs>=0.6.0 # file IO when using 'gcs://...' path
111111
- sqlalchemy # pandas.read_sql, DataFrame.to_sql

pandas/_libs/parsers.pyx

+18-10
Original file line numberDiff line numberDiff line change
@@ -558,18 +558,11 @@ cdef class TextReader:
558558
pass
559559

560560
def __dealloc__(self):
561-
self.close()
561+
_close(self)
562562
parser_del(self.parser)
563563

564-
def close(self) -> None:
565-
# also preemptively free all allocated memory
566-
parser_free(self.parser)
567-
if self.true_set:
568-
kh_destroy_str_starts(self.true_set)
569-
self.true_set = NULL
570-
if self.false_set:
571-
kh_destroy_str_starts(self.false_set)
572-
self.false_set = NULL
564+
def close(self):
565+
_close(self)
573566

574567
def _set_quoting(self, quote_char: str | bytes | None, quoting: int):
575568
if not isinstance(quoting, int):
@@ -1292,6 +1285,21 @@ cdef class TextReader:
12921285
return None
12931286

12941287

1288+
# Factor out code common to TextReader.__dealloc__ and TextReader.close
1289+
# It cannot be a class method, since calling self.close() in __dealloc__
1290+
# which causes a class attribute lookup and violates best parctices
1291+
# https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#finalization-method-dealloc
1292+
cdef _close(TextReader reader):
1293+
# also preemptively free all allocated memory
1294+
parser_free(reader.parser)
1295+
if reader.true_set:
1296+
kh_destroy_str_starts(reader.true_set)
1297+
reader.true_set = NULL
1298+
if reader.false_set:
1299+
kh_destroy_str_starts(reader.false_set)
1300+
reader.false_set = NULL
1301+
1302+
12951303
cdef:
12961304
object _true_values = [b'True', b'TRUE', b'true']
12971305
object _false_values = [b'False', b'FALSE', b'false']

pandas/_testing/__init__.py

+19-28
Original file line numberDiff line numberDiff line change
@@ -28,17 +28,12 @@
2828
from pandas._typing import Dtype
2929

3030
from pandas.core.dtypes.common import (
31-
is_datetime64_dtype,
32-
is_datetime64tz_dtype,
3331
is_float_dtype,
3432
is_integer_dtype,
35-
is_period_dtype,
3633
is_sequence,
37-
is_timedelta64_dtype,
3834
is_unsigned_integer_dtype,
3935
pandas_dtype,
4036
)
41-
from pandas.core.dtypes.dtypes import IntervalDtype
4237

4338
import pandas as pd
4439
from pandas import (
@@ -112,14 +107,11 @@
112107
)
113108
from pandas.core.arrays import (
114109
BaseMaskedArray,
115-
DatetimeArray,
116110
ExtensionArray,
117111
PandasArray,
118-
PeriodArray,
119-
TimedeltaArray,
120-
period_array,
121112
)
122113
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray
114+
from pandas.core.construction import extract_array
123115

124116
if TYPE_CHECKING:
125117
from pandas import (
@@ -161,6 +153,17 @@
161153
+ BYTES_DTYPES
162154
)
163155

156+
NARROW_NP_DTYPES = [
157+
np.float16,
158+
np.float32,
159+
np.int8,
160+
np.int16,
161+
np.int32,
162+
np.uint8,
163+
np.uint16,
164+
np.uint32,
165+
]
166+
164167
NULL_OBJECTS = [None, np.nan, pd.NaT, float("nan"), pd.NA, Decimal("NaN")]
165168
NP_NAT_OBJECTS = [
166169
cls("NaT", unit)
@@ -257,13 +260,6 @@ def box_expected(expected, box_cls, transpose=True):
257260
# single-row special cases in datetime arithmetic
258261
expected = expected.T
259262
expected = pd.concat([expected] * 2, ignore_index=True)
260-
elif box_cls is PeriodArray:
261-
# the PeriodArray constructor is not as flexible as period_array
262-
expected = period_array(expected)
263-
elif box_cls is DatetimeArray:
264-
expected = DatetimeArray(expected)
265-
elif box_cls is TimedeltaArray:
266-
expected = TimedeltaArray(expected)
267263
elif box_cls is np.ndarray or box_cls is np.array:
268264
expected = np.array(expected)
269265
elif box_cls is to_array:
@@ -274,21 +270,16 @@ def box_expected(expected, box_cls, transpose=True):
274270

275271

276272
def to_array(obj):
273+
"""
274+
Similar to pd.array, but does not cast numpy dtypes to nullable dtypes.
275+
"""
277276
# temporary implementation until we get pd.array in place
278277
dtype = getattr(obj, "dtype", None)
279278

280-
if is_period_dtype(dtype):
281-
return period_array(obj)
282-
elif is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype):
283-
return DatetimeArray._from_sequence(obj)
284-
elif is_timedelta64_dtype(dtype):
285-
return TimedeltaArray._from_sequence(obj)
286-
elif isinstance(obj, pd.core.arrays.BooleanArray):
287-
return obj
288-
elif isinstance(dtype, IntervalDtype):
289-
return pd.core.arrays.IntervalArray(obj)
290-
else:
291-
return np.array(obj)
279+
if dtype is None:
280+
return np.asarray(obj)
281+
282+
return extract_array(obj, extract_numpy=True)
292283

293284

294285
# -----------------------------------------------------------------------------

pandas/compat/__init__.py

-4
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@
1313
from pandas._typing import F
1414
from pandas.compat.numpy import (
1515
is_numpy_dev,
16-
np_array_datetime64_compat,
17-
np_datetime64_compat,
1816
np_version_under1p19,
1917
np_version_under1p20,
2018
)
@@ -130,8 +128,6 @@ def get_lzma_file():
130128

131129
__all__ = [
132130
"is_numpy_dev",
133-
"np_array_datetime64_compat",
134-
"np_datetime64_compat",
135131
"np_version_under1p19",
136132
"np_version_under1p20",
137133
"pa_version_under1p01",

pandas/compat/numpy/__init__.py

-41
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
""" support numpy compatibility across versions """
2-
3-
import re
4-
52
import numpy as np
63

74
from pandas.util.version import Version
@@ -29,44 +26,6 @@
2926
)
3027

3128

32-
_tz_regex = re.compile("[+-]0000$")
33-
34-
35-
def _tz_replacer(tstring):
36-
if isinstance(tstring, str):
37-
if tstring.endswith("Z"):
38-
tstring = tstring[:-1]
39-
elif _tz_regex.search(tstring):
40-
tstring = tstring[:-5]
41-
return tstring
42-
43-
44-
def np_datetime64_compat(tstring: str, unit: str = "ns"):
45-
"""
46-
provide compat for construction of strings to numpy datetime64's with
47-
tz-changes in 1.11 that make '2015-01-01 09:00:00Z' show a deprecation
48-
warning, when need to pass '2015-01-01 09:00:00'
49-
"""
50-
tstring = _tz_replacer(tstring)
51-
return np.datetime64(tstring, unit)
52-
53-
54-
def np_array_datetime64_compat(arr, dtype="M8[ns]"):
55-
"""
56-
provide compat for construction of an array of strings to a
57-
np.array(..., dtype=np.datetime64(..))
58-
tz-changes in 1.11 that make '2015-01-01 09:00:00Z' show a deprecation
59-
warning, when need to pass '2015-01-01 09:00:00'
60-
"""
61-
# is_list_like; can't import as it would be circular
62-
if hasattr(arr, "__iter__") and not isinstance(arr, (str, bytes)):
63-
arr = [_tz_replacer(s) for s in arr]
64-
else:
65-
arr = _tz_replacer(arr)
66-
67-
return np.array(arr, dtype=dtype)
68-
69-
7029
__all__ = [
7130
"np",
7231
"_np_version",

pandas/compat/pickle_compat.py

-3
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,6 @@ def load_reduce(self):
3535
args = stack.pop()
3636
func = stack[-1]
3737

38-
if len(args) and type(args[0]) is type:
39-
n = args[0].__name__ # noqa
40-
4138
try:
4239
stack[-1] = func(*args)
4340
return

0 commit comments

Comments
 (0)