Skip to content

Commit 0de48d0

Browse files
committed
Merge pull request #10569 from jreback/comp
ERR: Boolean comparisons of a Series vs None will now be equivalent to null comparisons
2 parents 5b97367 + effb676 commit 0de48d0

File tree

15 files changed

+363
-233
lines changed

15 files changed

+363
-233
lines changed

doc/source/whatsnew/v0.17.0.txt

+73-70
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ New features
3434

3535
Other enhancements
3636
^^^^^^^^^^^^^^^^^^
37+
3738
- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)
3839

3940
- ``DatetimeIndex`` can be instantiated using strings contains ``NaT`` (:issue:`7599`)
@@ -91,7 +92,7 @@ Backwards incompatible API changes
9192
Changes to convert_objects
9293
^^^^^^^^^^^^^^^^^^^^^^^^^^
9394

94-
- ``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)
95+
``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)
9596

9697
===================== =============
9798
Old New
@@ -101,70 +102,65 @@ Changes to convert_objects
101102
``convert_timedelta`` ``timedelta``
102103
===================== =============
103104

104-
- Coercing types with ``DataFrame.convert_objects`` is now implemented using the
105-
keyword argument ``coerce=True``. Previously types were coerced by setting a
106-
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.
107-
108-
.. ipython:: python
109-
110-
df = pd.DataFrame({'i': ['1','2'],
111-
'f': ['apple', '4.2'],
112-
's': ['apple','banana']})
113-
df
105+
Coercing types with ``DataFrame.convert_objects`` is now implemented using the
106+
keyword argument ``coerce=True``. Previously types were coerced by setting a
107+
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.
114108

115-
The old usage of ``DataFrame.convert_objects`` used `'coerce'` along with the
116-
type.
109+
.. ipython:: python
117110

118-
.. code-block:: python
111+
df = pd.DataFrame({'i': ['1','2'],
112+
'f': ['apple', '4.2'],
113+
's': ['apple','banana']})
114+
df
119115

120-
In [2]: df.convert_objects(convert_numeric='coerce')
116+
The old usage of ``DataFrame.convert_objects`` used `'coerce'` along with the
117+
type.
121118

122-
Now the ``coerce`` keyword must be explicitly used.
119+
.. code-block:: python
123120

124-
.. ipython:: python
121+
In [2]: df.convert_objects(convert_numeric='coerce')
125122

126-
df.convert_objects(numeric=True, coerce=True)
123+
Now the ``coerce`` keyword must be explicitly used.
127124

128-
- In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
129-
numeric types when there were no values convertible to a numeric type. For example,
125+
.. ipython:: python
130126

131-
.. code-block:: python
127+
df.convert_objects(numeric=True, coerce=True)
132128

133-
In [1]: df = pd.DataFrame({'s': ['a','b']})
134-
In [2]: df.convert_objects(convert_numeric='coerce')
135-
Out[2]:
136-
s
137-
0 a
138-
1 b
129+
In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
130+
numeric types when there were no values convertible to a numeric type. This returns
131+
the original DataFrame with no conversion. This change alters
132+
this behavior so that converts all non-number-like strings to ``NaN``.
139133

140-
returns the original DataFrame with no conversion. This change alters
141-
this behavior so that
134+
.. code-block:: python
142135

143-
.. ipython:: python
136+
In [1]: df = pd.DataFrame({'s': ['a','b']})
137+
In [2]: df.convert_objects(convert_numeric='coerce')
138+
Out[2]:
139+
s
140+
0 a
141+
1 b
144142

145-
pd.DataFrame({'s': ['a','b']})
146-
df.convert_objects(numeric=True, coerce=True)
143+
.. ipython:: python
147144

148-
converts all non-number-like strings to ``NaN``.
145+
pd.DataFrame({'s': ['a','b']})
146+
df.convert_objects(numeric=True, coerce=True)
149147

150-
- In earlier versions of pandas, the default behavior was to try and convert
151-
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
152-
to do nothing, and so it is necessary to pass at least one conversion target
153-
in the method call.
148+
In earlier versions of pandas, the default behavior was to try and convert
149+
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
150+
to do nothing, and so it is necessary to pass at least one conversion target
151+
in the method call.
154152

155-
.. _whatsnew_0170.api_breaking.other:
153+
Changes to Index Comparisons
154+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
156155

157-
Other API Changes
158-
^^^^^^^^^^^^^^^^^
156+
Operator equal on Index should behavior similarly to Series (:issue:`9947`)
159157

160-
- Operator equal on Index should behavior similarly to Series (:issue:`9947`)
158+
Starting in v0.17.0, comparing ``Index`` objects of different lengths will raise
159+
a ``ValueError``. This is to be consistent with the behavior of ``Series``.
161160

162-
Starting in v0.17.0, comparing ``Index`` objects of different lengths will raise
163-
a ``ValueError``. This is to be consistent with the behavior of ``Series``.
161+
Previous behavior:
164162

165-
Previous behavior:
166-
167-
.. code-block:: python
163+
.. code-block:: python
168164

169165
In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
170166
Out[2]: array([ True, False, False], dtype=bool)
@@ -188,9 +184,9 @@ Other API Changes
188184
In [7]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
189185
ValueError: Series lengths must match to compare
190186

191-
New behavior:
187+
New behavior:
192188

193-
.. code-block:: python
189+
.. code-block:: python
194190

195191
In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
196192
Out[8]: array([ True, False, False], dtype=bool)
@@ -214,25 +210,27 @@ Other API Changes
214210
In [13]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
215211
ValueError: Series lengths must match to compare
216212

217-
Note that this is different from the ``numpy`` behavior where a comparison can
218-
be broadcast:
213+
Note that this is different from the ``numpy`` behavior where a comparison can
214+
be broadcast:
219215

220-
.. ipython:: python
216+
.. ipython:: python
221217

222218
np.array([1, 2, 3]) == np.array([1])
223219

224-
or it can return False if broadcasting can not be done:
220+
or it can return False if broadcasting can not be done:
225221

226-
.. ipython:: python
222+
.. ipython:: python
227223

228224
np.array([1, 2, 3]) == np.array([1, 2])
229225

226+
Other API Changes
227+
^^^^^^^^^^^^^^^^^
228+
230229
- Enable writing Excel files in :ref:`memory <_io.excel_writing_buffer>` using StringIO/BytesIO (:issue:`7074`)
231230
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`)
232231
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
233232
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`).
234233

235-
236234
.. _whatsnew_0170.deprecations:
237235

238236
Deprecations
@@ -243,6 +241,8 @@ Deprecations
243241
Removal of prior version deprecations/changes
244242
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
245243

244+
- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)
245+
246246
.. _dask: https://dask.readthedocs.org/en/latest/
247247

248248
.. _whatsnew_0170.gil:
@@ -285,48 +285,51 @@ Performance Improvements
285285
Bug Fixes
286286
~~~~~~~~~
287287

288+
- Boolean comparisons of a ``Series`` vs ``None`` will now be equivalent to comparing with ``np.nan``, rather than raise ``TypeError``, xref (:issue:`1079`).
288289
- Bug in ``DataFrame.apply`` when function returns categorical series. (:issue:`9573`)
289290
- Bug in ``to_datetime`` with invalid dates and formats supplied (:issue:`10154`)
290-
291291
- Bug in ``Index.drop_duplicates`` dropping name(s) (:issue:`10115`)
292-
293-
294292
- Bug in ``pd.Series`` when setting a value on an empty ``Series`` whose index has a frequency. (:issue:`10193`)
295-
296293
- Bug in ``DataFrame.plot`` raises ``ValueError`` when color name is specified by multiple characters (:issue:`10387`)
297294
- Bug in ``DataFrame.reset_index`` when index contains `NaT`. (:issue:`10388`)
295+
- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)
296+
- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
297+
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)
298298

299299

300-
- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)
301300

302301

303-
- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
304-
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)
305302

306303

307304
- Bug in ``DataFrame.interpolate`` with ``axis=1`` and ``inplace=True`` (:issue:`10395`)
308-
309305
- Bug in ``io.sql.get_schema`` when specifying multiple columns as primary
310306
key (:issue:`10385`).
311-
312-
313307
- Bug in ``test_categorical`` on big-endian builds (:issue:`10425`)
314308
- Bug in ``Series.map`` using categorical ``Series`` raises ``AttributeError`` (:issue:`10324`)
315309
- Bug in ``MultiIndex.get_level_values`` including ``Categorical`` raises ``AttributeError`` (:issue:`10460`)
316310

311+
312+
313+
314+
315+
316+
317317
- Bug that caused segfault when resampling an empty Series (:issue:`10228`)
318318
- Bug in ``DatetimeIndex`` and ``PeriodIndex.value_counts`` resets name from its result, but retains in result's ``Index``. (:issue:`10150`)
319-
320319
- Bug in `pandas.concat` with ``axis=0`` when column is of dtype ``category`` (:issue:`10177`)
321-
322320
- Bug in ``read_msgpack`` where input type is not always checked (:issue:`10369`)
323-
324321
- Bug in `pandas.read_csv` with ``index_col=False`` or with ``index_col=['a', 'b']`` (:issue:`10413`, :issue:`10467`)
325-
326322
- Bug in `Series.from_csv` with ``header`` kwarg not setting the ``Series.name`` or the ``Series.index.name`` (:issue:`10483`)
327-
328323
- Bug in `groupby.var` which caused variance to be inaccurate for small float values (:issue:`10448`)
329-
330324
- Bug in ``Series.plot(kind='hist')`` Y Label not informative (:issue:`10485`)
331325

326+
327+
328+
329+
330+
331+
332+
333+
334+
332335
- Bug in operator equal on Index not being consistent with Series (:issue:`9947`)

pandas/core/common.py

+24
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,10 @@ def array_equivalent(left, right, strict_nan=False):
462462
if issubclass(left.dtype.type, (np.floating, np.complexfloating)):
463463
return ((left == right) | (np.isnan(left) & np.isnan(right))).all()
464464

465+
# numpy will will not allow this type of datetimelike vs integer comparison
466+
elif is_datetimelike_v_numeric(left, right):
467+
return False
468+
465469
# NaNs cannot occur otherwise.
466470
return np.array_equal(left, right)
467471

@@ -2539,6 +2543,26 @@ def is_datetime_or_timedelta_dtype(arr_or_dtype):
25392543
return issubclass(tipo, (np.datetime64, np.timedelta64))
25402544

25412545

2546+
def is_datetimelike_v_numeric(a, b):
2547+
# return if we have an i8 convertible and numeric comparision
2548+
if not hasattr(a,'dtype'):
2549+
a = np.asarray(a)
2550+
if not hasattr(b, 'dtype'):
2551+
b = np.asarray(b)
2552+
f = lambda x: is_integer_dtype(x) or is_float_dtype(x)
2553+
return (needs_i8_conversion(a) and f(b)) or (
2554+
needs_i8_conversion(b) and f(a))
2555+
2556+
def is_datetimelike_v_object(a, b):
2557+
# return if we have an i8 convertible and object comparision
2558+
if not hasattr(a,'dtype'):
2559+
a = np.asarray(a)
2560+
if not hasattr(b, 'dtype'):
2561+
b = np.asarray(b)
2562+
f = lambda x: is_object_dtype(x)
2563+
return (needs_i8_conversion(a) and f(b)) or (
2564+
needs_i8_conversion(b) and f(a))
2565+
25422566
needs_i8_conversion = is_datetime_or_timedelta_dtype
25432567

25442568
def i8_boxer(arr_or_dtype):

pandas/core/generic.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -3574,7 +3574,14 @@ def where(self, cond, other=np.nan, inplace=False, axis=None, level=None,
35743574
except ValueError:
35753575
new_other = np.array(other)
35763576

3577-
matches = (new_other == np.array(other))
3577+
# we can end up comparing integers and m8[ns]
3578+
# which is a numpy no no
3579+
is_i8 = com.needs_i8_conversion(self.dtype)
3580+
if is_i8:
3581+
matches = False
3582+
else:
3583+
matches = (new_other == np.array(other))
3584+
35783585
if matches is False or not matches.all():
35793586

35803587
# coerce other to a common dtype if we can

pandas/core/index.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -164,18 +164,18 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, fastpath=False,
164164
elif data is None or np.isscalar(data):
165165
cls._scalar_data_error(data)
166166
else:
167-
if tupleize_cols and isinstance(data, list) and data:
167+
if tupleize_cols and isinstance(data, list) and data and isinstance(data[0], tuple):
168168
try:
169-
sorted(data)
170-
has_mixed_types = False
171-
except (TypeError, UnicodeDecodeError):
172-
has_mixed_types = True # python3 only
173-
if isinstance(data[0], tuple) and not has_mixed_types:
174-
try:
175-
return MultiIndex.from_tuples(
176-
data, names=name or kwargs.get('names'))
177-
except (TypeError, KeyError):
178-
pass # python2 - MultiIndex fails on mixed types
169+
170+
# must be orderable in py3
171+
if compat.PY3:
172+
sorted(data)
173+
return MultiIndex.from_tuples(
174+
data, names=name or kwargs.get('names'))
175+
except (TypeError, KeyError):
176+
# python2 - MultiIndex fails on mixed types
177+
pass
178+
179179
# other iterable of some kind
180180
subarr = com._asarray_tuplesafe(data, dtype=object)
181181

pandas/core/internals.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
is_null_datelike_scalar, _maybe_promote,
1515
is_timedelta64_dtype, is_datetime64_dtype,
1616
array_equivalent, _maybe_convert_string_to_object,
17-
is_categorical)
17+
is_categorical, needs_i8_conversion, is_datetimelike_v_numeric)
1818
from pandas.core.index import Index, MultiIndex, _ensure_index
1919
from pandas.core.indexing import maybe_convert_indices, length_of_indexer
2020
from pandas.core.categorical import Categorical, maybe_to_categorical
@@ -3885,9 +3885,16 @@ def _vstack(to_stack, dtype):
38853885

38863886

38873887
def _possibly_compare(a, b, op):
3888-
res = op(a, b)
3888+
38893889
is_a_array = isinstance(a, np.ndarray)
38903890
is_b_array = isinstance(b, np.ndarray)
3891+
3892+
# numpy deprecation warning to have i8 vs integer comparisions
3893+
if is_datetimelike_v_numeric(a, b):
3894+
res = False
3895+
else:
3896+
res = op(a, b)
3897+
38913898
if np.isscalar(res) and (is_a_array or is_b_array):
38923899
type_names = [type(a).__name__, type(b).__name__]
38933900

0 commit comments

Comments
 (0)