Skip to content

Commit 65a2e0a

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into tst-needs2
2 parents db4ae90 + 4e807a2 commit 65a2e0a

34 files changed

+338
-143
lines changed

.travis.yml

+19-16
Original file line numberDiff line numberDiff line change
@@ -30,31 +30,34 @@ matrix:
3030
- python: 3.5
3131

3232
include:
33-
- dist: trusty
34-
env:
33+
- env:
3534
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network)"
3635

37-
- dist: trusty
38-
env:
36+
- env:
3937
- JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network)"
4038

41-
- dist: trusty
42-
env:
43-
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8"
39+
- env:
40+
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
41+
services:
42+
- mysql
43+
- postgresql
4444

45-
- dist: trusty
46-
env:
47-
- JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true
45+
- env:
46+
- JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true SQL="1"
47+
services:
48+
- mysql
49+
- postgresql
4850

4951
# In allow_failures
50-
- dist: trusty
51-
env:
52-
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
52+
- env:
53+
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" SQL="1"
54+
services:
55+
- mysql
56+
- postgresql
5357

5458
allow_failures:
55-
- dist: trusty
56-
env:
57-
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
59+
- env:
60+
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" SQL="1"
5861

5962
before_install:
6063
- echo "before_install"

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Here are just a few of the things that pandas does well:
124124
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
125125
- [**Time series**][timeseries]-specific functionality: date range
126126
generation and frequency conversion, moving window statistics,
127-
moving window linear regressions, date shifting and lagging, etc.
127+
date shifting and lagging.
128128

129129

130130
[missing-data]: https://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data

ci/code_checks.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ function invgrep {
3939
}
4040

4141
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
42-
FLAKE8_FORMAT="##[error]%(path)s:%(row)s:%(col)s:%(code):%(text)s"
42+
FLAKE8_FORMAT="##[error]%(path)s:%(row)s:%(col)s:%(code)s:%(text)s"
4343
INVGREP_PREPEND="##[error]"
4444
else
4545
FLAKE8_FORMAT="default"

ci/setup_env.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,8 @@ echo "conda list"
140140
conda list
141141

142142
# Install DB for Linux
143-
if [ "${TRAVIS_OS_NAME}" == "linux" ]; then
143+
144+
if [[ -n ${SQL:0} ]]; then
144145
echo "installing dbs"
145146
mysql -e 'create database pandas_nosetest;'
146147
psql -c 'create database pandas_nosetest;' -U postgres

doc/source/getting_started/overview.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,7 @@ Here are just a few of the things that pandas does well:
5757
Excel files, databases, and saving / loading data from the ultrafast **HDF5
5858
format**
5959
- **Time series**-specific functionality: date range generation and frequency
60-
conversion, moving window statistics, moving window linear regressions,
61-
date shifting and lagging, etc.
60+
conversion, moving window statistics, date shifting and lagging.
6261

6362
Many of these principles are here to address the shortcomings frequently
6463
experienced using other languages / scientific research environments. For data

doc/source/user_guide/io.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
3535
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
3636
binary;`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__;:ref:`read_pickle<io.pickle>`;:ref:`to_pickle<io.pickle>`
3737
SQL;`SQL <https://en.wikipedia.org/wiki/SQL>`__;:ref:`read_sql<io.sql>`;:ref:`to_sql<io.sql>`
38-
SQL;`Google Big Query <https://en.wikipedia.org/wiki/BigQuery>`__;:ref:`read_gbq<io.bigquery>`;:ref:`to_gbq<io.bigquery>`
38+
SQL;`Google BigQuery <https://en.wikipedia.org/wiki/BigQuery>`__;:ref:`read_gbq<io.bigquery>`;:ref:`to_gbq<io.bigquery>`
3939

4040
:ref:`Here <io.perf>` is an informal performance comparison for some of these IO methods.
4141

doc/source/user_guide/text.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,11 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
9494
2. Some string methods, like :meth:`Series.str.decode` are not available
9595
on ``StringArray`` because ``StringArray`` only holds strings, not
9696
bytes.
97-
97+
3. In comparision operations, :class:`arrays.StringArray` and ``Series`` backed
98+
by a ``StringArray`` will return an object with :class:`BooleanDtype`,
99+
rather than a ``bool`` dtype object. Missing values in a ``StringArray``
100+
will propagate in comparision operations, rather than always comparing
101+
unequal like :attr:`numpy.nan`.
98102

99103
Everything else that follows in the rest of this document applies equally to
100104
``string`` and ``object`` dtype.

doc/source/whatsnew/v1.0.0.rst

+7-2
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,8 @@ Other enhancements
205205
(:meth:`~DataFrame.to_parquet` / :func:`read_parquet`) using the `'pyarrow'` engine
206206
now preserve those data types with pyarrow >= 1.0.0 (:issue:`20612`).
207207
- The ``partition_cols`` argument in :meth:`DataFrame.to_parquet` now accepts a string (:issue:`27117`)
208+
- :func:`to_parquet` now appropriately handles the ``schema`` argument for user defined schemas in the pyarrow engine. (:issue: `30270`)
209+
208210

209211
Build Changes
210212
^^^^^^^^^^^^^
@@ -486,6 +488,7 @@ Documentation Improvements
486488
Deprecations
487489
~~~~~~~~~~~~
488490

491+
- :meth:`Series.item` and :meth:`Index.item` have been _undeprecated_ (:issue:`29250`)
489492
- ``Index.set_value`` has been deprecated. For a given index ``idx``, array ``arr``,
490493
value in ``idx`` of ``idx_val`` and a new value of ``val``, ``idx.set_value(arr, idx_val, val)``
491494
is equivalent to ``arr[idx.get_loc(idx_val)] = val``, which should be used instead (:issue:`28621`).
@@ -681,6 +684,7 @@ Categorical
681684
same type as if one used the :meth:`.str.` / :meth:`.dt.` on a :class:`Series` of that type. E.g. when accessing :meth:`Series.dt.tz_localize` on a
682685
:class:`Categorical` with duplicate entries, the accessor was skipping duplicates (:issue:`27952`)
683686
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` that would give incorrect results on categorical data (:issue:`26988`)
687+
- Bug where calling :meth:`Categorical.min` or :meth:`Categorical.max` on an empty Categorical would raise a numpy exception (:issue:`30227`)
684688

685689

686690
Datetimelike
@@ -702,6 +706,8 @@ Datetimelike
702706
- Bug in :attr:`Timestamp.resolution` being a property instead of a class attribute (:issue:`29910`)
703707
- Bug in :func:`pandas.to_datetime` when called with ``None`` raising ``TypeError`` instead of returning ``NaT`` (:issue:`30011`)
704708
- Bug in :func:`pandas.to_datetime` failing for `deques` when using ``cache=True`` (the default) (:issue:`29403`)
709+
- Bug in :meth:`Series.item` with ``datetime64`` or ``timedelta64`` dtype, :meth:`DatetimeIndex.item`, and :meth:`TimedeltaIndex.item` returning an integer instead of a :class:`Timestamp` or :class:`Timedelta` (:issue:`30175`)
710+
-
705711

706712
Timedelta
707713
^^^^^^^^^
@@ -797,7 +803,6 @@ I/O
797803
- Bug in :func:`read_json` where default encoding was not set to ``utf-8`` (:issue:`29565`)
798804
- Bug in :class:`PythonParser` where str and bytes were being mixed when dealing with the decimal field (:issue:`29650`)
799805
- :meth:`read_gbq` now accepts ``progress_bar_type`` to display progress bar while the data downloads. (:issue:`29857`)
800-
-
801806

802807
Plotting
803808
^^^^^^^^
@@ -862,7 +867,7 @@ ExtensionArray
862867

863868
- Bug in :class:`arrays.PandasArray` when setting a scalar string (:issue:`28118`, :issue:`28150`).
864869
- Bug where nullable integers could not be compared to strings (:issue:`28930`)
865-
-
870+
- Bug where :class:`DataFrame` constructor raised ValueError with list-like data and ``dtype`` specified (:issue:`30280`)
866871

867872

868873
Other

pandas/__init__.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,5 @@ class SparseSeries:
273273
Excel files, databases, and saving/loading data from the ultrafast HDF5
274274
format.
275275
- Time series-specific functionality: date range generation and frequency
276-
conversion, moving window statistics, moving window linear regressions,
277-
date shifting and lagging, etc.
276+
conversion, moving window statistics, date shifting and lagging.
278277
"""

pandas/_config/config.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,7 @@ def register_option(key: str, defval: object, doc="", validator=None, cb=None):
462462

463463
cursor = _global_config
464464
msg = "Path prefix to option '{option}' is already an option"
465+
465466
for i, p in enumerate(path[:-1]):
466467
if not isinstance(cursor, dict):
467468
raise OptionError(msg.format(option=".".join(path[:i])))
@@ -650,8 +651,9 @@ def _build_option_description(k):
650651
s += f"\n [default: {o.defval}] [currently: {_get_option(k, True)}]"
651652

652653
if d:
654+
rkey = d.rkey if d.rkey else ""
653655
s += "\n (Deprecated"
654-
s += ", use `{rkey}` instead.".format(rkey=d.rkey if d.rkey else "")
656+
s += f", use `{rkey}` instead."
655657
s += ")"
656658

657659
return s

pandas/_config/localization.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,6 @@ def get_locales(prefix=None, normalize=True, locale_getter=_default_locale_gette
161161
if prefix is None:
162162
return _valid_locales(out_locales, normalize)
163163

164-
pattern = re.compile("{prefix}.*".format(prefix=prefix))
164+
pattern = re.compile(f"{prefix}.*")
165165
found = pattern.findall("\n".join(out_locales))
166166
return _valid_locales(found, normalize)

pandas/core/arrays/base.py

+1-5
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,6 @@
2727
from pandas.core.missing import backfill_1d, pad_1d
2828
from pandas.core.sorting import nargsort
2929

30-
_not_implemented_message = "{} does not implement {}."
31-
3230
_extension_array_shared_docs: Dict[str, str] = dict()
3331

3432

@@ -330,9 +328,7 @@ def __setitem__(self, key: Union[int, np.ndarray], value: Any) -> None:
330328
# __init__ method coerces that value, then so should __setitem__
331329
# Note, also, that Series/DataFrame.where internally use __setitem__
332330
# on a copy of the data.
333-
raise NotImplementedError(
334-
_not_implemented_message.format(type(self), "__setitem__")
335-
)
331+
raise NotImplementedError(f"{type(self)} does not implement __setitem__.")
336332

337333
def __len__(self) -> int:
338334
"""

pandas/core/arrays/boolean.py

+18
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,24 @@ def __repr__(self) -> str:
103103
def _is_boolean(self) -> bool:
104104
return True
105105

106+
def __from_arrow__(self, array):
107+
"""Construct BooleanArray from passed pyarrow Array/ChunkedArray"""
108+
import pyarrow
109+
110+
if isinstance(array, pyarrow.Array):
111+
chunks = [array]
112+
else:
113+
# pyarrow.ChunkedArray
114+
chunks = array.chunks
115+
116+
results = []
117+
for arr in chunks:
118+
# TODO should optimize this without going through object array
119+
bool_arr = BooleanArray._from_sequence(np.array(arr))
120+
results.append(bool_arr)
121+
122+
return BooleanArray._concat_same_type(results)
123+
106124

107125
def coerce_to_array(values, mask=None, copy: bool = False):
108126
"""

pandas/core/arrays/categorical.py

+16
Original file line numberDiff line numberDiff line change
@@ -2115,6 +2115,10 @@ def min(self, skipna=True):
21152115
21162116
Only ordered `Categoricals` have a minimum!
21172117
2118+
.. versionchanged:: 1.0.0
2119+
2120+
Returns an NA value on empty arrays
2121+
21182122
Raises
21192123
------
21202124
TypeError
@@ -2125,6 +2129,10 @@ def min(self, skipna=True):
21252129
min : the minimum of this `Categorical`
21262130
"""
21272131
self.check_for_ordered("min")
2132+
2133+
if not len(self._codes):
2134+
return self.dtype.na_value
2135+
21282136
good = self._codes != -1
21292137
if not good.all():
21302138
if skipna:
@@ -2142,6 +2150,10 @@ def max(self, skipna=True):
21422150
21432151
Only ordered `Categoricals` have a maximum!
21442152
2153+
.. versionchanged:: 1.0.0
2154+
2155+
Returns an NA value on empty arrays
2156+
21452157
Raises
21462158
------
21472159
TypeError
@@ -2152,6 +2164,10 @@ def max(self, skipna=True):
21522164
max : the maximum of this `Categorical`
21532165
"""
21542166
self.check_for_ordered("max")
2167+
2168+
if not len(self._codes):
2169+
return self.dtype.na_value
2170+
21552171
good = self._codes != -1
21562172
if not good.all():
21572173
if skipna:

pandas/core/arrays/string_.py

+29-9
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def __from_arrow__(self, array):
8686

8787
results = []
8888
for arr in chunks:
89-
# using _from_sequence to ensure None is convered to np.nan
89+
# using _from_sequence to ensure None is convered to NA
9090
str_arr = StringArray._from_sequence(np.array(arr))
9191
results.append(str_arr)
9292

@@ -134,6 +134,10 @@ class StringArray(PandasArray):
134134
The string methods are available on Series backed by
135135
a StringArray.
136136
137+
Notes
138+
-----
139+
StringArray returns a BooleanArray for comparison methods.
140+
137141
Examples
138142
--------
139143
>>> pd.array(['This is', 'some text', None, 'data.'], dtype="string")
@@ -148,6 +152,13 @@ class StringArray(PandasArray):
148152
Traceback (most recent call last):
149153
...
150154
ValueError: StringArray requires an object-dtype ndarray of strings.
155+
156+
For comparision methods, this returns a :class:`pandas.BooleanArray`
157+
158+
>>> pd.array(["a", None, "c"], dtype="string") == "a"
159+
<BooleanArray>
160+
[True, NA, False]
161+
Length: 3, dtype: boolean
151162
"""
152163

153164
# undo the PandasArray hack
@@ -197,7 +208,10 @@ def __arrow_array__(self, type=None):
197208

198209
if type is None:
199210
type = pa.string()
200-
return pa.array(self._ndarray, type=type, from_pandas=True)
211+
212+
values = self._ndarray.copy()
213+
values[self.isna()] = None
214+
return pa.array(values, type=type, from_pandas=True)
201215

202216
def _values_for_factorize(self):
203217
arr = self._ndarray.copy()
@@ -255,7 +269,12 @@ def value_counts(self, dropna=False):
255269
# Overrride parent because we have different return types.
256270
@classmethod
257271
def _create_arithmetic_method(cls, op):
272+
# Note: this handles both arithmetic and comparison methods.
258273
def method(self, other):
274+
from pandas.arrays import BooleanArray
275+
276+
assert op.__name__ in ops.ARITHMETIC_BINOPS | ops.COMPARISON_BINOPS
277+
259278
if isinstance(other, (ABCIndexClass, ABCSeries, ABCDataFrame)):
260279
return NotImplemented
261280

@@ -275,15 +294,16 @@ def method(self, other):
275294
other = np.asarray(other)
276295
other = other[valid]
277296

278-
result = np.empty_like(self._ndarray, dtype="object")
279-
result[mask] = StringDtype.na_value
280-
result[valid] = op(self._ndarray[valid], other)
281-
282-
if op.__name__ in {"add", "radd", "mul", "rmul"}:
297+
if op.__name__ in ops.ARITHMETIC_BINOPS:
298+
result = np.empty_like(self._ndarray, dtype="object")
299+
result[mask] = StringDtype.na_value
300+
result[valid] = op(self._ndarray[valid], other)
283301
return StringArray(result)
284302
else:
285-
dtype = "object" if mask.any() else "bool"
286-
return np.asarray(result, dtype=dtype)
303+
# logical
304+
result = np.zeros(len(self._ndarray), dtype="bool")
305+
result[valid] = op(self._ndarray[valid], other)
306+
return BooleanArray(result, mask)
287307

288308
return compat.set_function_name(method, f"__{op.__name__}__", cls)
289309

0 commit comments

Comments
 (0)