Skip to content

Commit 90af4a5

Browse files
committed
Merge branch 'master' into PR_TOOL_MERGE_PR_23776
2 parents ae90f93 + c9c9912 commit 90af4a5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+1074
-944
lines changed

ci/deps/azure-27-compat.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytz=2013b
1717
- scipy=0.18.1
1818
- sqlalchemy=0.7.8
19-
- xlrd=0.9.2
19+
- xlrd=1.0.0
2020
- xlsxwriter=0.5.2
2121
- xlwt=0.7.5
2222
# universal

ci/deps/travis-27-locale.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytz=2013b
1717
- scipy
1818
- sqlalchemy=0.8.1
19-
- xlrd=0.9.2
19+
- xlrd=1.0.0
2020
- xlsxwriter=0.5.2
2121
- xlwt=0.7.5
2222
# universal

ci/deps/travis-27.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ dependencies:
3535
- scipy
3636
- sqlalchemy=0.9.6
3737
- xarray=0.9.6
38-
- xlrd=0.9.2
38+
- xlrd=1.0.0
3939
- xlsxwriter=0.5.2
4040
- xlwt=0.7.5
4141
# universal

doc/source/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ Optional Dependencies
269269
* `matplotlib <http://matplotlib.org/>`__: for plotting, Version 2.0.0 or higher.
270270
* For Excel I/O:
271271

272-
* `xlrd/xlwt <http://www.python-excel.org/>`__: Excel reading (xlrd) and writing (xlwt)
272+
* `xlrd/xlwt <http://www.python-excel.org/>`__: Excel reading (xlrd), version 1.0.0 or higher required, and writing (xlwt)
273273
* `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`__: openpyxl version 2.4.0
274274
for writing .xlsx files (xlrd >= 0.9.0)
275275
* `XlsxWriter <https://pypi.org/project/XlsxWriter>`__: Alternative Excel writer

doc/source/whatsnew/v0.24.0.rst

+9-2
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,7 @@ Other Enhancements
288288
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
289289
- :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
290290
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`)
291+
- :meth:`Index.difference` now has an optional ``sort`` parameter to specify whether the results should be sorted if possible (:issue:`17839`)
291292
- :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)
292293
- :meth:`MultiIndex.to_flat_index` has been added to flatten multiple levels into a single-level :class:`Index` object.
293294
- :meth:`DataFrame.to_stata` and :class:` pandas.io.stata.StataWriter117` can write mixed sting columns to Stata strl format (:issue:`23633`)
@@ -307,7 +308,7 @@ Backwards incompatible API changes
307308
Dependencies have increased minimum versions
308309
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
309310

310-
We have updated our minimum supported versions of dependencies (:issue:`21242`, `18742`).
311+
We have updated our minimum supported versions of dependencies (:issue:`21242`, :issue:`18742`, :issue:`23774`).
311312
If installed, we now require:
312313

313314
+-----------------+-----------------+----------+
@@ -331,6 +332,8 @@ If installed, we now require:
331332
+-----------------+-----------------+----------+
332333
| scipy | 0.18.1 | |
333334
+-----------------+-----------------+----------+
335+
| xlrd | 1.0.0 | |
336+
+-----------------+-----------------+----------+
334337

335338
Additionally we no longer depend on `feather-format` for feather based storage
336339
and replaced it with references to `pyarrow` (:issue:`21639` and :issue:`23053`).
@@ -1033,6 +1036,7 @@ Deprecations
10331036
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
10341037
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
10351038
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
1039+
- The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`)
10361040
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
10371041
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
10381042
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
@@ -1276,7 +1280,7 @@ Strings
12761280

12771281
- Bug in :meth:`Index.str.partition` was not nan-safe (:issue:`23558`).
12781282
- Bug in :meth:`Index.str.split` was not nan-safe (:issue:`23677`).
1279-
-
1283+
- Bug :func:`Series.str.contains` not respecting the ``na`` argument for a ``Categorical`` dtype ``Series`` (:issue:`22158`)
12801284

12811285
Interval
12821286
^^^^^^^^
@@ -1382,8 +1386,10 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
13821386
- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
13831387
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
13841388
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
1389+
- Bug in :func:`read_csv()` in which memory leaks occurred in the C engine when parsing ``NaN`` values due to insufficient cleanup on completion or error (:issue:`21353`)
13851390
- Bug in :func:`read_csv()` in which incorrect error messages were being raised when ``skipfooter`` was passed in along with ``nrows``, ``iterator``, or ``chunksize`` (:issue:`23711`)
13861391
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
1392+
- Bug in :meth:`read_csv()` in which unnecessary warnings were being raised when the dialect's values conflicted with the default arguments (:issue:`23761`)
13871393
- Bug in :meth:`read_html()` in which the error message was not displaying the valid flavors when an invalid one was provided (:issue:`23549`)
13881394
- Bug in :meth:`read_excel()` in which extraneous header names were extracted, even though none were specified (:issue:`11733`)
13891395
- Bug in :meth:`read_excel()` in which ``index_col=None`` was not being respected and parsing index columns anyway (:issue:`20480`)
@@ -1412,6 +1418,7 @@ Groupby/Resample/Rolling
14121418
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` when resampling by a weekly offset (``'W'``) across a DST transition (:issue:`9119`, :issue:`21459`)
14131419
- Bug in :meth:`DataFrame.expanding` in which the ``axis`` argument was not being respected during aggregations (:issue:`23372`)
14141420
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` which caused missing values when the input function can accept a :class:`DataFrame` but renames it (:issue:`23455`).
1421+
- Bug in :func:`pandas.core.groupby.GroupBy.nth` where column order was not always preserved (:issue:`20760`)
14151422

14161423
Reshaping
14171424
^^^^^^^^^

pandas/_libs/algos_rank_helper.pxi.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average',
126126

127127
sorted_data = values.take(_as)
128128
sorted_mask = mask.take(_as)
129-
_indices = np.diff(sorted_mask).nonzero()[0]
129+
_indices = np.diff(sorted_mask.astype(int)).nonzero()[0]
130130
non_na_idx = _indices[0] if len(_indices) > 0 else -1
131131
argsorted = _as.astype('i8')
132132

pandas/_libs/index.pyx

+2
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ cdef class IndexEngine:
113113
if not self.is_unique:
114114
return self._get_loc_duplicates(val)
115115
values = self._get_index_values()
116+
117+
self._check_type(val)
116118
loc = _bin_search(values, val) # .searchsorted(val, side='left')
117119
if loc >= len(values):
118120
raise KeyError(val)

pandas/_libs/index_class_helper.pxi.in

+2
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ cdef class {{name}}Engine(IndexEngine):
5151
raise KeyError(val)
5252
elif util.is_float_object(val):
5353
raise KeyError(val)
54+
elif not util.is_integer_object(val):
55+
raise KeyError(val)
5456
{{endif}}
5557

5658
{{if name != 'Object'}}

pandas/_libs/parsers.pyx

+28-18
Original file line numberDiff line numberDiff line change
@@ -1070,18 +1070,6 @@ cdef class TextReader:
10701070

10711071
conv = self._get_converter(i, name)
10721072

1073-
# XXX
1074-
na_flist = set()
1075-
if self.na_filter:
1076-
na_list, na_flist = self._get_na_list(i, name)
1077-
if na_list is None:
1078-
na_filter = 0
1079-
else:
1080-
na_filter = 1
1081-
na_hashset = kset_from_list(na_list)
1082-
else:
1083-
na_filter = 0
1084-
10851073
col_dtype = None
10861074
if self.dtype is not None:
10871075
if isinstance(self.dtype, dict):
@@ -1106,13 +1094,34 @@ cdef class TextReader:
11061094
self.c_encoding)
11071095
continue
11081096

1109-
# Should return as the desired dtype (inferred or specified)
1110-
col_res, na_count = self._convert_tokens(
1111-
i, start, end, name, na_filter, na_hashset,
1112-
na_flist, col_dtype)
1097+
# Collect the list of NaN values associated with the column.
1098+
# If we aren't supposed to do that, or none are collected,
1099+
# we set `na_filter` to `0` (`1` otherwise).
1100+
na_flist = set()
1101+
1102+
if self.na_filter:
1103+
na_list, na_flist = self._get_na_list(i, name)
1104+
if na_list is None:
1105+
na_filter = 0
1106+
else:
1107+
na_filter = 1
1108+
na_hashset = kset_from_list(na_list)
1109+
else:
1110+
na_filter = 0
11131111

1114-
if na_filter:
1115-
self._free_na_set(na_hashset)
1112+
# Attempt to parse tokens and infer dtype of the column.
1113+
# Should return as the desired dtype (inferred or specified).
1114+
try:
1115+
col_res, na_count = self._convert_tokens(
1116+
i, start, end, name, na_filter, na_hashset,
1117+
na_flist, col_dtype)
1118+
finally:
1119+
# gh-21353
1120+
#
1121+
# Cleanup the NaN hash that we generated
1122+
# to avoid memory leaks.
1123+
if na_filter:
1124+
self._free_na_set(na_hashset)
11161125

11171126
if upcast_na and na_count > 0:
11181127
col_res = _maybe_upcast(col_res)
@@ -2059,6 +2068,7 @@ cdef kh_str_t* kset_from_list(list values) except NULL:
20592068

20602069
# None creeps in sometimes, which isn't possible here
20612070
if not isinstance(val, bytes):
2071+
kh_destroy_str(table)
20622072
raise ValueError('Must be all encoded bytes')
20632073

20642074
k = kh_put_str(table, PyBytes_AsString(val), &ret)

pandas/core/arrays/base.py

+8-6
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,18 @@
55
This is an experimental API and subject to breaking changes
66
without warning.
77
"""
8-
import numpy as np
9-
108
import operator
119

12-
from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass
13-
from pandas.errors import AbstractMethodError
10+
import numpy as np
11+
12+
from pandas.compat import PY3, set_function_name
1413
from pandas.compat.numpy import function as nv
15-
from pandas.compat import set_function_name, PY3
16-
from pandas.core import ops
14+
from pandas.errors import AbstractMethodError
15+
1716
from pandas.core.dtypes.common import is_list_like
17+
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
18+
19+
from pandas.core import ops
1820

1921
_not_implemented_message = "{} does not implement {}."
2022

pandas/core/arrays/categorical.py

+26-41
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,46 @@
11
# pylint: disable=E1101,W0232
22

3-
import numpy as np
4-
from warnings import warn
53
import textwrap
4+
from warnings import warn
65

7-
from pandas import compat
8-
from pandas.compat import u, lzip
9-
from pandas._libs import lib, algos as libalgos
6+
import numpy as np
7+
8+
from pandas._libs import algos as libalgos, lib
9+
import pandas.compat as compat
10+
from pandas.compat import lzip, u
11+
from pandas.compat.numpy import function as nv
12+
from pandas.util._decorators import (
13+
Appender, Substitution, cache_readonly, deprecate_kwarg)
14+
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs
1015

11-
from pandas.core.dtypes.generic import (
12-
ABCSeries, ABCIndexClass, ABCCategoricalIndex)
13-
from pandas.core.dtypes.missing import isna, notna
14-
from pandas.core.dtypes.inference import is_hashable
1516
from pandas.core.dtypes.cast import (
16-
maybe_infer_to_datetimelike,
17-
coerce_indexer_dtype)
18-
from pandas.core.dtypes.dtypes import CategoricalDtype
17+
coerce_indexer_dtype, maybe_infer_to_datetimelike)
1918
from pandas.core.dtypes.common import (
20-
ensure_int64,
21-
ensure_object,
22-
ensure_platform_int,
23-
is_extension_array_dtype,
24-
is_dtype_equal,
25-
is_datetimelike,
26-
is_datetime64_dtype,
27-
is_timedelta64_dtype,
28-
is_categorical,
29-
is_categorical_dtype,
30-
is_float_dtype,
31-
is_integer_dtype,
32-
is_object_dtype,
33-
is_list_like, is_sequence,
34-
is_scalar, is_iterator,
35-
is_dict_like)
36-
37-
from pandas.core.algorithms import factorize, take_1d, unique1d, take
19+
ensure_int64, ensure_object, ensure_platform_int, is_categorical,
20+
is_categorical_dtype, is_datetime64_dtype, is_datetimelike, is_dict_like,
21+
is_dtype_equal, is_extension_array_dtype, is_float_dtype, is_integer_dtype,
22+
is_iterator, is_list_like, is_object_dtype, is_scalar, is_sequence,
23+
is_timedelta64_dtype)
24+
from pandas.core.dtypes.dtypes import CategoricalDtype
25+
from pandas.core.dtypes.generic import (
26+
ABCCategoricalIndex, ABCIndexClass, ABCSeries)
27+
from pandas.core.dtypes.inference import is_hashable
28+
from pandas.core.dtypes.missing import isna, notna
29+
3830
from pandas.core.accessor import PandasDelegate, delegate_names
39-
from pandas.core.base import (PandasObject,
40-
NoNewAttributesMixin, _shared_docs)
31+
import pandas.core.algorithms as algorithms
32+
from pandas.core.algorithms import factorize, take, take_1d, unique1d
33+
from pandas.core.base import NoNewAttributesMixin, PandasObject, _shared_docs
4134
import pandas.core.common as com
35+
from pandas.core.config import get_option
4236
from pandas.core.missing import interpolate_2d
43-
from pandas.compat.numpy import function as nv
44-
from pandas.util._decorators import (
45-
Appender, cache_readonly, deprecate_kwarg, Substitution)
46-
47-
import pandas.core.algorithms as algorithms
48-
4937
from pandas.core.sorting import nargsort
5038

5139
from pandas.io.formats import console
5240
from pandas.io.formats.terminal import get_terminal_size
53-
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs
54-
from pandas.core.config import get_option
5541

5642
from .base import ExtensionArray
5743

58-
5944
_take_msg = textwrap.dedent("""\
6045
Interpreting negative values in 'indexer' as missing values.
6146
In the future, this will change to meaning positional indices

pandas/core/arrays/datetimelike.py

+15-26
Original file line numberDiff line numberDiff line change
@@ -5,44 +5,33 @@
55

66
import numpy as np
77

8-
from pandas._libs import lib, iNaT, NaT
8+
from pandas._libs import NaT, iNaT, lib
99
from pandas._libs.tslibs import timezones
10-
from pandas._libs.tslibs.timedeltas import delta_to_nanoseconds, Timedelta
11-
from pandas._libs.tslibs.timestamps import maybe_integer_op_deprecated
1210
from pandas._libs.tslibs.period import (
13-
Period, DIFFERENT_FREQ_INDEX, IncompatibleFrequency)
14-
11+
DIFFERENT_FREQ_INDEX, IncompatibleFrequency, Period)
12+
from pandas._libs.tslibs.timedeltas import Timedelta, delta_to_nanoseconds
13+
from pandas._libs.tslibs.timestamps import maybe_integer_op_deprecated
14+
import pandas.compat as compat
1515
from pandas.errors import (
1616
AbstractMethodError, NullFrequencyError, PerformanceWarning)
17-
from pandas import compat
18-
19-
from pandas.tseries import frequencies
20-
from pandas.tseries.offsets import Tick, DateOffset
17+
from pandas.util._decorators import deprecate_kwarg
2118

2219
from pandas.core.dtypes.common import (
23-
pandas_dtype,
24-
needs_i8_conversion,
25-
is_list_like,
26-
is_offsetlike,
27-
is_extension_array_dtype,
28-
is_datetime64_dtype,
29-
is_datetime64_any_dtype,
30-
is_datetime64tz_dtype,
31-
is_float_dtype,
32-
is_integer_dtype,
33-
is_bool_dtype,
34-
is_period_dtype,
35-
is_timedelta64_dtype,
36-
is_object_dtype)
37-
from pandas.core.dtypes.generic import ABCSeries, ABCDataFrame, ABCIndexClass
20+
is_bool_dtype, is_datetime64_any_dtype, is_datetime64_dtype,
21+
is_datetime64tz_dtype, is_extension_array_dtype, is_float_dtype,
22+
is_integer_dtype, is_list_like, is_object_dtype, is_offsetlike,
23+
is_period_dtype, is_timedelta64_dtype, needs_i8_conversion, pandas_dtype)
3824
from pandas.core.dtypes.dtypes import DatetimeTZDtype
25+
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
3926
from pandas.core.dtypes.missing import isna
4027

41-
import pandas.core.common as com
4228
from pandas.core.algorithms import checked_add_with_arr, take, unique1d
29+
import pandas.core.common as com
30+
31+
from pandas.tseries import frequencies
32+
from pandas.tseries.offsets import DateOffset, Tick
4333

4434
from .base import ExtensionOpsMixin
45-
from pandas.util._decorators import deprecate_kwarg
4635

4736

4837
def _make_comparison_op(cls, op):

0 commit comments

Comments
 (0)