Skip to content

Commit 57b9eab

Browse files
committed
Merge remote-tracking branch 'upstream/master' into categorical-bool-fixed
2 parents b35647e + 2f0773f commit 57b9eab

25 files changed

+1041
-563
lines changed

doc/source/whatsnew/v0.23.2.txt

+2-32
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,6 @@ Fixed Regressions
6363
- Fixed regression in :func:`to_clipboard` that defaulted to copying dataframes with space delimited instead of tab delimited (:issue:`21104`)
6464

6565

66-
Documentation Changes
67-
~~~~~~~~~~~~~~~~~~~~~
68-
69-
-
70-
-
71-
7266
Build Changes
7367
-------------
7468

@@ -79,56 +73,32 @@ Build Changes
7973
Bug Fixes
8074
~~~~~~~~~
8175

82-
**Groupby/Resample/Rolling**
83-
84-
-
85-
-
86-
87-
**Timedelta**
88-
89-
- Bug in :class:`Timedelta` where non-zero timedeltas shorter than 1 microsecond were considered False (:issue:`21484`)
90-
9176
**Conversion**
9277

9378
- Bug in constructing :class:`Index` with an iterator or generator (:issue:`21470`)
9479
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`)
9580

96-
9781
**Indexing**
9882

9983
- Bug in :meth:`Index.get_indexer_non_unique` with categorical key (:issue:`21448`)
10084
- Bug in comparison operations for :class:`MultiIndex` where error was raised on equality / inequality comparison involving a MultiIndex with ``nlevels == 1`` (:issue:`21149`)
10185
- Bug in :meth:`DataFrame.drop` behaviour is not consistent for unique and non-unique indexes (:issue:`21494`)
10286
- Bug in :func:`DataFrame.duplicated` with a large number of columns causing a 'maximum recursion depth exceeded' (:issue:`21524`).
103-
-
10487

10588
**I/O**
10689

10790
- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`)
10891
- Bug in :func:`json_normalize` when formatting the ``record_prefix`` with integer columns (:issue:`21536`)
109-
-
110-
111-
**Plotting**
112-
113-
-
114-
-
115-
116-
**Reshaping**
117-
118-
-
119-
-
12092

12193
**Categorical**
12294

12395
- Bug in rendering :class:`Series` with ``Categorical`` dtype in rare conditions under Python 2.7 (:issue:`21002`)
124-
-
12596

12697
**Timezones**
12798

12899
- Bug in :class:`Timestamp` and :class:`DatetimeIndex` where passing a :class:`Timestamp` localized after a DST transition would return a datetime before the DST transition (:issue:`20854`)
129100
- Bug in comparing :class:`DataFrame`s with tz-aware :class:`DatetimeIndex` columns with a DST transition that raised a ``KeyError`` (:issue:`19970`)
130101

102+
**Timedelta**
131103

132-
**Other**
133-
134-
-
104+
- Bug in :class:`Timedelta` where non-zero timedeltas shorter than 1 microsecond were considered False (:issue:`21484`)

doc/source/whatsnew/v0.24.0.txt

+44-4
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ New features
1010

1111
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)
1212

13-
.. _whatsnew_0240.enhancements.extension_array_operators
13+
.. _whatsnew_0240.enhancements.extension_array_operators:
1414

1515
``ExtensionArray`` operator support
1616
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -26,6 +26,46 @@ See the :ref:`ExtensionArray Operator Support
2626
<extending.extension.operator>` documentation section for details on both
2727
ways of adding operator support.
2828

29+
.. _whatsnew_0240.enhancements.read_html:
30+
31+
``read_html`` Enhancements
32+
^^^^^^^^^^^^^^^^^^^^^^^^^^
33+
34+
:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes.
35+
Now it understands them, treating them as sequences of cells with the same
36+
value. (:issue:`17054`)
37+
38+
.. ipython:: python
39+
40+
result = pd.read_html("""
41+
<table>
42+
<thead>
43+
<tr>
44+
<th>A</th><th>B</th><th>C</th>
45+
</tr>
46+
</thead>
47+
<tbody>
48+
<tr>
49+
<td colspan="2">1</td><td>2</td>
50+
</tr>
51+
</tbody>
52+
</table>""")
53+
54+
Previous Behavior:
55+
56+
.. code-block:: ipython
57+
58+
In [13]: result
59+
Out [13]:
60+
[ A B C
61+
0 1 2 NaN]
62+
63+
Current Behavior:
64+
65+
.. ipython:: python
66+
67+
result
68+
2969
.. _whatsnew_0240.enhancements.other:
3070

3171
Other Enhancements
@@ -40,6 +80,7 @@ Other Enhancements
4080
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__.
4181
(:issue:`21627`)
4282
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`)
83+
- :func:`read_html` copies cell data across ``colspan``s and ``rowspan``s, and it treats all-``th`` table rows as headers if ``header`` kwarg is not given and there is no ``thead`` (:issue:`17054`)
4384
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`)
4485
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
4586
-
@@ -330,15 +371,14 @@ I/O
330371
^^^
331372

332373
- Bug in :func:`read_csv` with a ``CategoricalDtype`` with boolean categories not correctly coercing the string values to booleans (:issue:`20498`)
333-
-
374+
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
334375
-
335376
-
336377

337378
Plotting
338379
^^^^^^^^
339380

340-
-
341-
-
381+
- Bug in :func:'DataFrame.plot.scatter' and :func:'DataFrame.plot.hexbin' caused x-axis label and ticklabels to disappear when colorbar was on in IPython inline backend (:issue:`10611`, :issue:`10678`, and :issue:`20455`)
342382
-
343383

344384
Groupby/Resample/Rolling

pandas/_libs/tslib.pyx

+2-53
Original file line numberDiff line numberDiff line change
@@ -35,18 +35,17 @@ from cython cimport Py_ssize_t
3535

3636

3737
import pytz
38-
UTC = pytz.utc
3938

4039

4140
from tslibs.timedeltas cimport cast_from_unit
42-
from tslibs.timedeltas import Timedelta
41+
from tslibs.timedeltas import Timedelta, ints_to_pytimedelta # noqa:F841
4342
from tslibs.timezones cimport (is_utc, is_tzlocal, is_fixed_offset,
4443
treat_tz_as_pytz, get_dst_info)
4544
from tslibs.conversion cimport (tz_convert_single, _TSObject,
4645
convert_datetime_to_tsobject,
4746
get_datetime64_nanos,
4847
tz_convert_utc_to_tzlocal)
49-
from tslibs.conversion import tz_convert_single
48+
from tslibs.conversion import tz_convert_single, normalize_date # noqa:F841
5049

5150
from tslibs.nattype import NaT, nat_strings, iNaT
5251
from tslibs.nattype cimport checknull_with_nat, NPY_NAT
@@ -185,29 +184,6 @@ def ints_to_pydatetime(ndarray[int64_t] arr, tz=None, freq=None,
185184
return result
186185

187186

188-
def ints_to_pytimedelta(ndarray[int64_t] arr, box=False):
189-
# convert an i8 repr to an ndarray of timedelta or Timedelta (if box ==
190-
# True)
191-
192-
cdef:
193-
Py_ssize_t i, n = len(arr)
194-
int64_t value
195-
ndarray[object] result = np.empty(n, dtype=object)
196-
197-
for i in range(n):
198-
199-
value = arr[i]
200-
if value == NPY_NAT:
201-
result[i] = NaT
202-
else:
203-
if box:
204-
result[i] = Timedelta(value)
205-
else:
206-
result[i] = timedelta(microseconds=int(value) / 1000)
207-
208-
return result
209-
210-
211187
def _test_parse_iso8601(object ts):
212188
"""
213189
TESTING ONLY: Parse string into Timestamp using iso8601 parser. Used
@@ -740,30 +716,3 @@ cdef inline bint _parse_today_now(str val, int64_t* iresult):
740716
iresult[0] = Timestamp.today().value
741717
return True
742718
return False
743-
744-
# ----------------------------------------------------------------------
745-
# Some general helper functions
746-
747-
748-
cpdef normalize_date(object dt):
749-
"""
750-
Normalize datetime.datetime value to midnight. Returns datetime.date as a
751-
datetime.datetime at midnight
752-
753-
Returns
754-
-------
755-
normalized : datetime.datetime or Timestamp
756-
"""
757-
if PyDateTime_Check(dt):
758-
if not PyDateTime_CheckExact(dt):
759-
# i.e. a Timestamp object
760-
return dt.replace(hour=0, minute=0, second=0, microsecond=0,
761-
nanosecond=0)
762-
else:
763-
# regular datetime object
764-
return dt.replace(hour=0, minute=0, second=0, microsecond=0)
765-
# TODO: Make sure DST crossing is handled correctly here
766-
elif PyDate_Check(dt):
767-
return datetime(dt.year, dt.month, dt.day)
768-
else:
769-
raise TypeError('Unrecognized type: %s' % type(dt))

pandas/_libs/tslibs/__init__.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
11
# -*- coding: utf-8 -*-
2-
# cython: profile=False
2+
# flake8: noqa
3+
4+
from .conversion import normalize_date, localize_pydatetime, tz_convert_single
5+
from .nattype import NaT, iNaT
6+
from .np_datetime import OutOfBoundsDatetime
7+
from .timestamps import Timestamp
8+
from .timedeltas import delta_to_nanoseconds, ints_to_pytimedelta, Timedelta

0 commit comments

Comments
 (0)