Skip to content

Commit 646b785

Browse files
committed
Merge remote-tracking branch 'upstream/master' into ci/simplify_deps
2 parents 8e802eb + c57f883 commit 646b785

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1386
-940
lines changed
96.9 KB
Loading
Loading

doc/source/reference/style.rst

+2
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Styler properties
2727
Styler.template_html_style
2828
Styler.template_html_table
2929
Styler.template_latex
30+
Styler.template_string
3031
Styler.loader
3132

3233
Style application
@@ -74,5 +75,6 @@ Style export and import
7475
Styler.to_html
7576
Styler.to_latex
7677
Styler.to_excel
78+
Styler.to_string
7779
Styler.export
7880
Styler.use

doc/source/user_guide/io.rst

+28-4
Original file line numberDiff line numberDiff line change
@@ -1305,14 +1305,38 @@ You can elect to skip bad lines:
13051305
0 1 2 3
13061306
1 8 9 10
13071307
1308+
Or pass a callable function to handle the bad line if ``engine="python"``.
1309+
The bad line will be a list of strings that was split by the ``sep``:
1310+
1311+
.. code-block:: ipython
1312+
1313+
In [29]: external_list = []
1314+
1315+
In [30]: def bad_lines_func(line):
1316+
...: external_list.append(line)
1317+
...: return line[-3:]
1318+
1319+
In [31]: pd.read_csv(StringIO(data), on_bad_lines=bad_lines_func, engine="python")
1320+
Out[31]:
1321+
a b c
1322+
0 1 2 3
1323+
1 5 6 7
1324+
2 8 9 10
1325+
1326+
In [32]: external_list
1327+
Out[32]: [4, 5, 6, 7]
1328+
1329+
.. versionadded:: 1.4.0
1330+
1331+
13081332
You can also use the ``usecols`` parameter to eliminate extraneous column
13091333
data that appear in some lines but not others:
13101334

13111335
.. code-block:: ipython
13121336
1313-
In [30]: pd.read_csv(StringIO(data), usecols=[0, 1, 2])
1337+
In [33]: pd.read_csv(StringIO(data), usecols=[0, 1, 2])
13141338
1315-
Out[30]:
1339+
Out[33]:
13161340
a b c
13171341
0 1 2 3
13181342
1 4 5 6
@@ -1324,9 +1348,9 @@ fields are filled with ``NaN``.
13241348

13251349
.. code-block:: ipython
13261350
1327-
In [31]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd'])
1351+
In [34]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd'])
13281352
1329-
Out[31]:
1353+
Out[34]:
13301354
a b c d
13311355
0 1 2 3 NaN
13321356
1 4 5 6 7

doc/source/whatsnew/v1.4.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ Other enhancements
208208
- :meth:`Series.str.split` now supports a ``regex`` argument that explicitly specifies whether the pattern is a regular expression. Default is ``None`` (:issue:`43563`, :issue:`32835`, :issue:`25549`)
209209
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like (:issue:`41021`)
210210
- Added :meth:`DataFrameGroupBy.value_counts` (:issue:`43564`)
211+
- :func:`read_csv` now accepts a ``callable`` function in ``on_bad_lines`` when ``engine="python"`` for custom handling of bad lines (:issue:`5686`)
211212
- :class:`ExcelWriter` argument ``if_sheet_exists="overlay"`` option added (:issue:`40231`)
212213
- :meth:`read_excel` now accepts a ``decimal`` argument that allow the user to specify the decimal point when parsing string columns to numeric (:issue:`14403`)
213214
- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, :meth:`.GroupBy.var`, :meth:`.GroupBy.sum` now supports `Numba <http://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`, :issue:`44939`)
@@ -815,7 +816,7 @@ Indexing
815816

816817
Missing
817818
^^^^^^^
818-
- Bug in :meth:`DataFrame.fillna` with limit and no method ignores axis='columns' or ``axis = 1`` (:issue:`40989`)
819+
- Bug in :meth:`DataFrame.fillna` with limit and no method ignores axis='columns' or ``axis = 1`` (:issue:`40989`, :issue:`17399`)
819820
- Bug in :meth:`DataFrame.fillna` not replacing missing values when using a dict-like ``value`` and duplicate column names (:issue:`43476`)
820821
- Bug in constructing a :class:`DataFrame` with a dictionary ``np.datetime64`` as a value and ``dtype='timedelta64[ns]'``, or vice-versa, incorrectly casting instead of raising (:issue:`44428`)
821822
- Bug in :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` with ``inplace=True`` not writing to the underlying array(s) in-place (:issue:`44749`)

doc/source/whatsnew/v1.5.0.rst

+6-4
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,12 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17-
.. _whatsnew_150.enhancements.enhancement1:
17+
.. _whatsnew_150.enhancements.styler:
1818

19-
enhancement1
20-
^^^^^^^^^^^^
19+
Styler
20+
^^^^^^
21+
22+
- New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`)
2123

2224
.. _whatsnew_150.enhancements.enhancement2:
2325

@@ -151,7 +153,7 @@ Interval
151153

152154
Indexing
153155
^^^^^^^^
154-
-
156+
- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised insead of casting to a common dtype (:issue:`45070`)
155157
-
156158

157159
Missing

pandas/_libs/lib.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -714,7 +714,7 @@ cpdef ndarray[object] ensure_string_array(
714714
return out
715715

716716
arr = arr.to_numpy()
717-
elif not isinstance(arr, np.ndarray):
717+
elif not util.is_array(arr):
718718
arr = np.array(arr, dtype="object")
719719

720720
result = np.asarray(arr, dtype="object")
@@ -729,7 +729,7 @@ cpdef ndarray[object] ensure_string_array(
729729
continue
730730

731731
if not checknull(val):
732-
if not isinstance(val, np.floating):
732+
if not util.is_float_object(val):
733733
# f"{val}" is faster than str(val)
734734
result[i] = f"{val}"
735735
else:

pandas/_libs/tslibs/timedeltas.pyx

+19
Original file line numberDiff line numberDiff line change
@@ -1251,10 +1251,29 @@ class Timedelta(_Timedelta):
12511251
12521252
Notes
12531253
-----
1254+
The constructor may take in either both values of value and unit or
1255+
kwargs as above. Either one of them must be used during initialization
1256+
12541257
The ``.value`` attribute is always in ns.
12551258
12561259
If the precision is higher than nanoseconds, the precision of the duration is
12571260
truncated to nanoseconds.
1261+
1262+
Examples
1263+
--------
1264+
Here we initialize Timedelta object with both value and unit
1265+
1266+
>>> td = pd.Timedelta(1, "d")
1267+
>>> td
1268+
Timedelta('1 days 00:00:00')
1269+
1270+
Here we initialize the Timedelta object with kwargs
1271+
1272+
>>> td2 = pd.Timedelta(days=1)
1273+
>>> td2
1274+
Timedelta('1 days 00:00:00')
1275+
1276+
We see that either way we get the same result
12581277
"""
12591278

12601279
_req_any_kwargs_new = {"weeks", "days", "hours", "minutes", "seconds",

pandas/core/arrays/categorical.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1360,7 +1360,7 @@ def remove_unused_categories(self, inplace=no_default):
13601360

13611361
def map(self, mapper):
13621362
"""
1363-
Map categories using input correspondence (dict, Series, or function).
1363+
Map categories using an input mapping or function.
13641364
13651365
Maps the categories to new categories. If the mapping correspondence is
13661366
one-to-one the result is a :class:`~pandas.Categorical` which has the
@@ -1994,7 +1994,7 @@ def _formatter(self, boxed: bool = False):
19941994
# Defer to CategoricalFormatter's formatter.
19951995
return None
19961996

1997-
def _tidy_repr(self, max_vals=10, footer=True) -> str:
1997+
def _tidy_repr(self, max_vals: int = 10, footer: bool = True) -> str:
19981998
"""
19991999
a short repr displaying only max_vals and an optional (but default
20002000
footer)
@@ -2009,7 +2009,7 @@ def _tidy_repr(self, max_vals=10, footer=True) -> str:
20092009

20102010
return str(result)
20112011

2012-
def _repr_categories(self):
2012+
def _repr_categories(self) -> list[str]:
20132013
"""
20142014
return the base repr for the categories
20152015
"""

pandas/core/arrays/datetimes.py

+13-7
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
from pandas.util._exceptions import find_stack_level
4242
from pandas.util._validators import validate_inclusive
4343

44-
from pandas.core.dtypes.cast import astype_dt64_to_dt64tz
44+
from pandas.core.dtypes.astype import astype_dt64_to_dt64tz
4545
from pandas.core.dtypes.common import (
4646
DT64NS_DTYPE,
4747
INT64_DTYPE,
@@ -570,7 +570,7 @@ def dtype(self) -> np.dtype | DatetimeTZDtype: # type: ignore[override]
570570
@property
571571
def tz(self) -> tzinfo | None:
572572
"""
573-
Return timezone, if any.
573+
Return the timezone.
574574
575575
Returns
576576
-------
@@ -1278,7 +1278,9 @@ def day_name(self, locale=None):
12781278
@property
12791279
def time(self) -> npt.NDArray[np.object_]:
12801280
"""
1281-
Returns numpy array of datetime.time. The time part of the Timestamps.
1281+
Returns numpy array of :class:`datetime.time` objects.
1282+
1283+
The time part of the Timestamps.
12821284
"""
12831285
# If the Timestamps have a timezone that is not UTC,
12841286
# convert them into their i8 representation while
@@ -1290,16 +1292,20 @@ def time(self) -> npt.NDArray[np.object_]:
12901292
@property
12911293
def timetz(self) -> npt.NDArray[np.object_]:
12921294
"""
1293-
Returns numpy array of datetime.time also containing timezone
1294-
information. The time part of the Timestamps.
1295+
Returns numpy array of :class:`datetime.time` objects with timezone
1296+
information.
1297+
1298+
The time part of the Timestamps.
12951299
"""
12961300
return ints_to_pydatetime(self.asi8, self.tz, box="time")
12971301

12981302
@property
12991303
def date(self) -> npt.NDArray[np.object_]:
13001304
"""
1301-
Returns numpy array of python datetime.date objects (namely, the date
1302-
part of Timestamps without timezone information).
1305+
Returns numpy array of python :class:`datetime.date` objects.
1306+
1307+
Namely, the date part of Timestamps without time and
1308+
timezone information.
13031309
"""
13041310
# If the Timestamps have a timezone that is not UTC,
13051311
# convert them into their i8 representation while

pandas/core/arrays/floating.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
)
1717
from pandas.util._decorators import cache_readonly
1818

19-
from pandas.core.dtypes.cast import astype_nansafe
19+
from pandas.core.dtypes.astype import astype_nansafe
2020
from pandas.core.dtypes.common import (
2121
is_bool_dtype,
2222
is_datetime64_dtype,

pandas/core/arrays/masked.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -767,7 +767,7 @@ def _quantile(
767767
We assume that all impacted cases are 1D-only.
768768
"""
769769
mask = np.atleast_2d(np.asarray(self.isna()))
770-
npvalues = np.atleast_2d(np.asarray(self))
770+
npvalues: np.ndarray = np.atleast_2d(np.asarray(self))
771771

772772
res = quantile_with_mask(
773773
npvalues,

pandas/core/arrays/sparse/array.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,8 @@
4444
from pandas.util._exceptions import find_stack_level
4545
from pandas.util._validators import validate_insert_loc
4646

47+
from pandas.core.dtypes.astype import astype_nansafe
4748
from pandas.core.dtypes.cast import (
48-
astype_nansafe,
4949
construct_1d_arraylike_from_scalar,
5050
find_common_type,
5151
maybe_box_datetimelike,
@@ -298,7 +298,7 @@ class SparseArray(OpsMixin, PandasObject, ExtensionArray):
298298
repeats of the scalar value instead.
299299
300300
fill_value : scalar, optional
301-
Elements in `data` that are `fill_value` are not stored in the
301+
Elements in data that are ``fill_value`` are not stored in the
302302
SparseArray. For memory savings, this should be the most common value
303303
in `data`. By default, `fill_value` depends on the dtype of `data`:
304304
@@ -1265,7 +1265,7 @@ def astype(self, dtype: AstypeArg | None = None, copy: bool = True):
12651265

12661266
def map(self: SparseArrayT, mapper) -> SparseArrayT:
12671267
"""
1268-
Map categories using input correspondence (dict, Series, or function).
1268+
Map categories using an input mapping or function.
12691269
12701270
Parameters
12711271
----------

pandas/core/arrays/sparse/dtype.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@
1818
from pandas.errors import PerformanceWarning
1919
from pandas.util._exceptions import find_stack_level
2020

21+
from pandas.core.dtypes.astype import astype_nansafe
2122
from pandas.core.dtypes.base import (
2223
ExtensionDtype,
2324
register_extension_dtype,
2425
)
25-
from pandas.core.dtypes.cast import astype_nansafe
2626
from pandas.core.dtypes.common import (
2727
is_bool_dtype,
2828
is_object_dtype,

pandas/core/arrays/timedeltas.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
from pandas.compat.numpy import function as nv
3939
from pandas.util._validators import validate_endpoints
4040

41-
from pandas.core.dtypes.cast import astype_td64_unit_conversion
41+
from pandas.core.dtypes.astype import astype_td64_unit_conversion
4242
from pandas.core.dtypes.common import (
4343
DT64NS_DTYPE,
4444
TD64NS_DTYPE,

0 commit comments

Comments
 (0)