Surprising and inconsistent results when adding two Series
which both contain duplicate labels.
#20831
Labels
Bug
Needs Discussion
Requires discussion from core team before further action
Numeric Operations
Arithmetic, Comparison, and Logical operations
If I add two
Series
objects, both of which contain the same duplicate label, it appears that I get a result in which every possible combination for that label appears. In the below:s1
has the labelb
twice,s2
has the labelb
thrice, and the result has the labelb
six times.That doesn't seem an unreasonable behaviour, but it's not applied consistently. If instead the second series has the same number of occurrences, then the data entries are added elementwise:
That seems to make the "outer product" behaviour in the first example somewhat dangerous, because any code that depends on it is at risk of giving inconsistent results if it happens to get a dataset where the numbers of the various labels match exactly. Should the second behaviour be altered to be consistent with the first? Or should maybe the first behaviour become an error (after a suitable deprecation period)?
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.22.0
pytest: None
pip: 10.0.0
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: