Skip to content

Commit 5152cdd

Browse files
sinhrksjorisvandenbossche
authored andcommitted
API/BUG: Fix Series ops inconsistencies (#13894)
- series comparison operator to check whether labels are identical (currently: ignores labels) - series boolean operator to align with labels (currently: only keeps left index)
1 parent e23e6f1 commit 5152cdd

File tree

5 files changed

+450
-50
lines changed

5 files changed

+450
-50
lines changed

doc/source/whatsnew/v0.19.0.txt

+138
Original file line numberDiff line numberDiff line change
@@ -488,6 +488,143 @@ New Behavior:
488488

489489
type(s.tolist()[0])
490490

491+
.. _whatsnew_0190.api.series_ops:
492+
493+
``Series`` operators for different indexes
494+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
495+
496+
Following ``Series`` operators has been changed to make all operators consistent,
497+
including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)
498+
499+
- ``Series`` comparison operators now raise ``ValueError`` when ``index`` are different.
500+
- ``Series`` logical operators align both ``index``.
501+
502+
.. warning::
503+
Until 0.18.1, comparing ``Series`` with the same length has been succeeded even if
504+
these ``index`` are different (the result ignores ``index``). As of 0.19.0, it raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes using flexible comparison methods like ``.eq``.
505+
506+
507+
As a result, ``Series`` and ``DataFrame`` operators behave as below:
508+
509+
Arithmetic operators
510+
""""""""""""""""""""
511+
512+
Arithmetic operators align both ``index`` (no changes).
513+
514+
.. ipython:: python
515+
516+
s1 = pd.Series([1, 2, 3], index=list('ABC'))
517+
s2 = pd.Series([2, 2, 2], index=list('ABD'))
518+
s1 + s2
519+
520+
df1 = pd.DataFrame([1, 2, 3], index=list('ABC'))
521+
df2 = pd.DataFrame([2, 2, 2], index=list('ABD'))
522+
df1 + df2
523+
524+
Comparison operators
525+
""""""""""""""""""""
526+
527+
Comparison operators raise ``ValueError`` when ``index`` are different.
528+
529+
Previous Behavior (``Series``):
530+
531+
``Series`` compares values ignoring ``index`` as long as both lengthes are the same.
532+
533+
.. code-block:: ipython
534+
535+
In [1]: s1 == s2
536+
Out[1]:
537+
A False
538+
B True
539+
C False
540+
dtype: bool
541+
542+
New Behavior (``Series``):
543+
544+
.. code-block:: ipython
545+
546+
In [2]: s1 == s2
547+
Out[2]:
548+
ValueError: Can only compare identically-labeled Series objects
549+
550+
.. note::
551+
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.
552+
553+
.. ipython:: python
554+
555+
s1.values == s2.values
556+
557+
If you want to compare ``Series`` aligning its ``index``, see flexible comparison methods section below.
558+
559+
Current Behavior (``DataFrame``, no change):
560+
561+
.. code-block:: ipython
562+
563+
In [3]: df1 == df2
564+
Out[3]:
565+
ValueError: Can only compare identically-labeled DataFrame objects
566+
567+
Logical operators
568+
"""""""""""""""""
569+
570+
Logical operators align both ``index``.
571+
572+
Previous Behavior (``Series``):
573+
574+
Only left hand side ``index`` is kept.
575+
576+
.. code-block:: ipython
577+
578+
In [4]: s1 = pd.Series([True, False, True], index=list('ABC'))
579+
In [5]: s2 = pd.Series([True, True, True], index=list('ABD'))
580+
In [6]: s1 & s2
581+
Out[6]:
582+
A True
583+
B False
584+
C False
585+
dtype: bool
586+
587+
New Behavior (``Series``):
588+
589+
.. ipython:: python
590+
591+
s1 = pd.Series([True, False, True], index=list('ABC'))
592+
s2 = pd.Series([True, True, True], index=list('ABD'))
593+
s1 & s2
594+
595+
.. note::
596+
``Series`` logical operators fill ``NaN`` result with ``False``.
597+
598+
.. note::
599+
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.
600+
601+
.. ipython:: python
602+
603+
s1.values & s2.values
604+
605+
Current Behavior (``DataFrame``, no change):
606+
607+
.. ipython:: python
608+
609+
df1 = pd.DataFrame([True, False, True], index=list('ABC'))
610+
df2 = pd.DataFrame([True, True, True], index=list('ABD'))
611+
df1 & df2
612+
613+
Flexible comparison methods
614+
"""""""""""""""""""""""""""
615+
616+
``Series`` flexible comparison methods like ``eq``, ``ne``, ``le``, ``lt``, ``ge`` and ``gt`` now align both ``index``. Use these operators if you want to compare two ``Series``
617+
which has the different ``index``.
618+
619+
.. ipython:: python
620+
621+
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
622+
s2 = pd.Series([2, 2, 2], index=['b', 'c', 'd'])
623+
s1.eq(s2)
624+
s1.ge(s2)
625+
626+
Previously, it worked as the same as comparison operators (see above).
627+
491628
.. _whatsnew_0190.api.promote:
492629

493630
``Series`` type promotion on assignment
@@ -1107,6 +1244,7 @@ Bug Fixes
11071244
- Bug in using NumPy ufunc with ``PeriodIndex`` to add or subtract integer raise ``IncompatibleFrequency``. Note that using standard operator like ``+`` or ``-`` is recommended, because standard operators use more efficient path (:issue:`13980`)
11081245

11091246
- Bug in operations on ``NaT`` returning ``float`` instead of ``datetime64[ns]`` (:issue:`12941`)
1247+
- Bug in ``Series`` flexible arithmetic methods (like ``.add()``) raises ``ValueError`` when ``axis=None`` (:issue:`13894`)
11101248

11111249
- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)
11121250

pandas/core/ops.py

+66-19
Original file line numberDiff line numberDiff line change
@@ -311,17 +311,6 @@ def get_op(cls, left, right, name, na_op):
311311
is_datetime_lhs = (is_datetime64_dtype(left) or
312312
is_datetime64tz_dtype(left))
313313

314-
if isinstance(left, ABCSeries) and isinstance(right, ABCSeries):
315-
# avoid repated alignment
316-
if not left.index.equals(right.index):
317-
left, right = left.align(right, copy=False)
318-
319-
index, lidx, ridx = left.index.join(right.index, how='outer',
320-
return_indexers=True)
321-
# if DatetimeIndex have different tz, convert to UTC
322-
left.index = index
323-
right.index = index
324-
325314
if not (is_datetime_lhs or is_timedelta_lhs):
326315
return _Op(left, right, name, na_op)
327316
else:
@@ -603,6 +592,33 @@ def _is_offset(self, arr_or_obj):
603592
return False
604593

605594

595+
def _align_method_SERIES(left, right, align_asobject=False):
596+
""" align lhs and rhs Series """
597+
598+
# ToDo: Different from _align_method_FRAME, list, tuple and ndarray
599+
# are not coerced here
600+
# because Series has inconsistencies described in #13637
601+
602+
if isinstance(right, ABCSeries):
603+
# avoid repeated alignment
604+
if not left.index.equals(right.index):
605+
606+
if align_asobject:
607+
# to keep original value's dtype for bool ops
608+
left = left.astype(object)
609+
right = right.astype(object)
610+
611+
left, right = left.align(right, copy=False)
612+
613+
index, lidx, ridx = left.index.join(right.index, how='outer',
614+
return_indexers=True)
615+
# if DatetimeIndex have different tz, convert to UTC
616+
left.index = index
617+
right.index = index
618+
619+
return left, right
620+
621+
606622
def _arith_method_SERIES(op, name, str_rep, fill_zeros=None, default_axis=None,
607623
**eval_kwargs):
608624
"""
@@ -655,6 +671,8 @@ def wrapper(left, right, name=name, na_op=na_op):
655671
if isinstance(right, pd.DataFrame):
656672
return NotImplemented
657673

674+
left, right = _align_method_SERIES(left, right)
675+
658676
converted = _Op.get_op(left, right, name, na_op)
659677

660678
left, right = converted.left, converted.right
@@ -763,8 +781,9 @@ def wrapper(self, other, axis=None):
763781

764782
if isinstance(other, ABCSeries):
765783
name = _maybe_match_name(self, other)
766-
if len(self) != len(other):
767-
raise ValueError('Series lengths must match to compare')
784+
if not self._indexed_same(other):
785+
msg = 'Can only compare identically-labeled Series objects'
786+
raise ValueError(msg)
768787
return self._constructor(na_op(self.values, other.values),
769788
index=self.index, name=name)
770789
elif isinstance(other, pd.DataFrame): # pragma: no cover
@@ -786,6 +805,7 @@ def wrapper(self, other, axis=None):
786805

787806
return self._constructor(na_op(self.values, np.asarray(other)),
788807
index=self.index).__finalize__(self)
808+
789809
elif isinstance(other, pd.Categorical):
790810
if not is_categorical_dtype(self):
791811
msg = ("Cannot compare a Categorical for op {op} with Series "
@@ -860,9 +880,10 @@ def wrapper(self, other):
860880
fill_int = lambda x: x.fillna(0)
861881
fill_bool = lambda x: x.fillna(False).astype(bool)
862882

883+
self, other = _align_method_SERIES(self, other, align_asobject=True)
884+
863885
if isinstance(other, ABCSeries):
864886
name = _maybe_match_name(self, other)
865-
other = other.reindex_like(self)
866887
is_other_int_dtype = is_integer_dtype(other.dtype)
867888
other = fill_int(other) if is_other_int_dtype else fill_bool(other)
868889

@@ -912,7 +933,32 @@ def wrapper(self, other):
912933
'floordiv': {'op': '//',
913934
'desc': 'Integer division',
914935
'reversed': False,
915-
'reverse': 'rfloordiv'}}
936+
'reverse': 'rfloordiv'},
937+
938+
'eq': {'op': '==',
939+
'desc': 'Equal to',
940+
'reversed': False,
941+
'reverse': None},
942+
'ne': {'op': '!=',
943+
'desc': 'Not equal to',
944+
'reversed': False,
945+
'reverse': None},
946+
'lt': {'op': '<',
947+
'desc': 'Less than',
948+
'reversed': False,
949+
'reverse': None},
950+
'le': {'op': '<=',
951+
'desc': 'Less than or equal to',
952+
'reversed': False,
953+
'reverse': None},
954+
'gt': {'op': '>',
955+
'desc': 'Greater than',
956+
'reversed': False,
957+
'reverse': None},
958+
'ge': {'op': '>=',
959+
'desc': 'Greater than or equal to',
960+
'reversed': False,
961+
'reverse': None}}
916962

917963
_op_names = list(_op_descriptions.keys())
918964
for k in _op_names:
@@ -963,10 +1009,11 @@ def _flex_method_SERIES(op, name, str_rep, default_axis=None, fill_zeros=None,
9631009
@Appender(doc)
9641010
def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
9651011
# validate axis
966-
self._get_axis_number(axis)
1012+
if axis is not None:
1013+
self._get_axis_number(axis)
9671014
if isinstance(other, ABCSeries):
9681015
return self._binop(other, op, level=level, fill_value=fill_value)
969-
elif isinstance(other, (np.ndarray, ABCSeries, list, tuple)):
1016+
elif isinstance(other, (np.ndarray, list, tuple)):
9701017
if len(other) != len(self):
9711018
raise ValueError('Lengths must be equal')
9721019
return self._binop(self._constructor(other, self.index), op,
@@ -975,15 +1022,15 @@ def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
9751022
if fill_value is not None:
9761023
self = self.fillna(fill_value)
9771024

978-
return self._constructor(op(self.values, other),
1025+
return self._constructor(op(self, other),
9791026
self.index).__finalize__(self)
9801027

9811028
flex_wrapper.__name__ = name
9821029
return flex_wrapper
9831030

9841031

9851032
series_flex_funcs = dict(flex_arith_method=_flex_method_SERIES,
986-
flex_comp_method=_comp_method_SERIES)
1033+
flex_comp_method=_flex_method_SERIES)
9871034

9881035
series_special_funcs = dict(arith_method=_arith_method_SERIES,
9891036
comp_method=_comp_method_SERIES,

pandas/io/tests/json/test_ujson.py

+18-16
Original file line numberDiff line numberDiff line change
@@ -1306,43 +1306,45 @@ def testSeries(self):
13061306

13071307
# column indexed
13081308
outp = Series(ujson.decode(ujson.encode(s))).sort_values()
1309-
self.assertTrue((s == outp).values.all())
1309+
exp = Series([10, 20, 30, 40, 50, 60],
1310+
index=['6', '7', '8', '9', '10', '15'])
1311+
tm.assert_series_equal(outp, exp)
13101312

13111313
outp = Series(ujson.decode(ujson.encode(s), numpy=True)).sort_values()
1312-
self.assertTrue((s == outp).values.all())
1314+
tm.assert_series_equal(outp, exp)
13131315

13141316
dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split")))
13151317
outp = Series(**dec)
1316-
self.assertTrue((s == outp).values.all())
1317-
self.assertTrue(s.name == outp.name)
1318+
tm.assert_series_equal(outp, s)
13181319

13191320
dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split"),
13201321
numpy=True))
13211322
outp = Series(**dec)
1322-
self.assertTrue((s == outp).values.all())
1323-
self.assertTrue(s.name == outp.name)
13241323

1325-
outp = Series(ujson.decode(ujson.encode(
1326-
s, orient="records"), numpy=True))
1327-
self.assertTrue((s == outp).values.all())
1324+
outp = Series(ujson.decode(ujson.encode(s, orient="records"),
1325+
numpy=True))
1326+
exp = Series([10, 20, 30, 40, 50, 60])
1327+
tm.assert_series_equal(outp, exp)
13281328

13291329
outp = Series(ujson.decode(ujson.encode(s, orient="records")))
1330-
self.assertTrue((s == outp).values.all())
1330+
tm.assert_series_equal(outp, exp)
13311331

1332-
outp = Series(ujson.decode(
1333-
ujson.encode(s, orient="values"), numpy=True))
1334-
self.assertTrue((s == outp).values.all())
1332+
outp = Series(ujson.decode(ujson.encode(s, orient="values"),
1333+
numpy=True))
1334+
tm.assert_series_equal(outp, exp)
13351335

13361336
outp = Series(ujson.decode(ujson.encode(s, orient="values")))
1337-
self.assertTrue((s == outp).values.all())
1337+
tm.assert_series_equal(outp, exp)
13381338

13391339
outp = Series(ujson.decode(ujson.encode(
13401340
s, orient="index"))).sort_values()
1341-
self.assertTrue((s == outp).values.all())
1341+
exp = Series([10, 20, 30, 40, 50, 60],
1342+
index=['6', '7', '8', '9', '10', '15'])
1343+
tm.assert_series_equal(outp, exp)
13421344

13431345
outp = Series(ujson.decode(ujson.encode(
13441346
s, orient="index"), numpy=True)).sort_values()
1345-
self.assertTrue((s == outp).values.all())
1347+
tm.assert_series_equal(outp, exp)
13461348

13471349
def testSeriesNested(self):
13481350
s = Series([10, 20, 30, 40, 50, 60], name="series",

pandas/tests/indexes/common.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -685,7 +685,8 @@ def test_equals_op(self):
685685
index_a == series_d
686686
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
687687
index_a == array_d
688-
with tm.assertRaisesRegexp(ValueError, "Series lengths must match"):
688+
msg = "Can only compare identically-labeled Series objects"
689+
with tm.assertRaisesRegexp(ValueError, msg):
689690
series_a == series_d
690691
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
691692
series_a == array_d

0 commit comments

Comments
 (0)