Wrong orientation of operations between DataFrame and range() #17901

toobaz · 2017-10-17T10:10:29Z

Code Sample, a copy-pastable example if possible

In [3]: df = pd.DataFrame(np.ones((2,2)), columns=['A', 'B'])

In [4]: df + list(range(2))
Out[4]: 
     A    B
0  1.0  2.0
1  1.0  2.0

In [5]: df + pd.Series(range(2), index=df.columns)
Out[5]: 
     A    B
0  1.0  2.0
1  1.0  2.0

In [6]: df + range(2)
Out[6]: 
     A    B
0  1.0  1.0
1  2.0  2.0

Problem description

Operations where the operand is a range operate in the wrong direction compared to all other 1-d objects (I can think of).

Expected Output

Same as In [4]: and In [5]:.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: 8a69543
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+19.g8a69543ab.dirty
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jschendel · 2017-10-17T18:09:03Z

A bit obscure, but collections.deque has the same behavior as range in this case:

In [3]: df + collections.deque([0, 1])
Out[3]:
     A    B
0  1.0  1.0
1  2.0  2.0

toobaz · 2017-10-18T09:51:44Z

Yeah, I bet this happens with anything which is (conceptually) 1-d but not recognized as list-like by pandas.

jreback · 2017-10-18T10:33:40Z

must be a too specific path somewhere, these are both considered list-like (ala is_list_like)

jorisvandenbossche · 2017-10-18T14:54:01Z

is_list_like indeed considers them as list-like (not saying that the actual code path is using is_list_like):

In [38]: pd.api.types.is_list_like(range(3))
Out[38]: True

toobaz · 2017-10-18T21:36:37Z

OK, bet lost :-)

elsander · 2017-10-19T16:31:51Z

I'm interested in working on this bug, but I'm struggling to figure out where the relevant code is. There's no __add__ method given for the pd.DataFrame or NDFrame classes, so I suppose it's inherited?

elsander · 2017-10-19T17:43:37Z

Ah-- looks like the factory is in core/ops.py. I'll take a look at this.

elsander · 2017-10-19T19:00:59Z

Looks like the issue is that a list has an ndim attribute, but a range doesn't, which means that they are handled differently in this block from core.internals.Block.eval():

        is_transposed = False
        if hasattr(other, 'ndim') and hasattr(values, 'ndim'):
            if values.ndim != other.ndim:
                is_transposed = True
            else:
                if values.shape == other.shape[::-1]:
                    is_transposed = True
                elif values.shape[0] == other.shape[-1]:
                    is_transposed = True

One approach might be something like this:

import collections
if isinstance(other, collections.Iterable) and not hasattr(other, 'ndim'):
    other = list(other)

But I imagine this could introduce other gotchas and possibly hurt performance.

jschendel · 2017-10-19T19:50:01Z

@elsander : I've got a fix for this that I'm in the process of finalizing, but if you'd rather work on it I'd be fine with that. From what I can tell, the root cause appears to be pandas' alignment process prior to when addition/multiplication/etc occurs:

pandas/pandas/core/ops.py

Lines 1150 to 1151 in a2e5400

    
           def _align_method_FRAME(left, right, axis): 
        
               """ convert rhs to meet lhs dims if input is list, tuple or np.ndarray """

Specifically, the check here is a bit too narrow:

pandas/pandas/core/ops.py

Lines 1168 to 1169 in a2e5400

    
           if isinstance(right, (list, tuple)): 
        
               right = to_series(right)

Basically I swapped this with is_list_like(right) and excluded DataFrame/Series (since is_list_like catches them, but we don't care about casting those). Also moved the check against ndarray to be before the is_list_like check since is_list_like catches ndarray but there's slightly different logic for ndarray (and nested the ndim == 0 case inside the ndarray check so it didn't hit the subsequent is_list_like check) .

Could be other workarounds though, so not trying to suggest what I did is the "best" approach.

elsander · 2017-10-19T20:13:46Z

@jschendel if you have a fix in progress, go for it! I didn't have a good workaround in mind.

jschendel · 2017-10-19T22:34:56Z

@elsander : Thanks! Didn't mean to jump in and steal away the issue. I'd be happy to provide input or do code review for any other issues you want to work on (and not steal away the issue!).

jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions labels Oct 18, 2017

jorisvandenbossche added the Bug label Oct 18, 2017

jschendel mentioned this issue Oct 20, 2017

BUG: Fix alignment of list-like objects for operations with DataFrame #17926

Merged

4 tasks

jreback added this to the 0.21.0 milestone Oct 20, 2017

jreback closed this as completed in #17926 Oct 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong orientation of operations between DataFrame and range() #17901

Wrong orientation of operations between DataFrame and range() #17901

toobaz commented Oct 17, 2017

INSTALLED VERSIONS

jschendel commented Oct 17, 2017

toobaz commented Oct 18, 2017

jreback commented Oct 18, 2017 •

edited

Loading

jorisvandenbossche commented Oct 18, 2017

toobaz commented Oct 18, 2017

elsander commented Oct 19, 2017

elsander commented Oct 19, 2017

elsander commented Oct 19, 2017

jschendel commented Oct 19, 2017 •

edited

Loading

elsander commented Oct 19, 2017

jschendel commented Oct 19, 2017

Wrong orientation of operations between DataFrame and range() #17901

Wrong orientation of operations between DataFrame and range() #17901

Comments

toobaz commented Oct 17, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jschendel commented Oct 17, 2017

toobaz commented Oct 18, 2017

jreback commented Oct 18, 2017 • edited Loading

jorisvandenbossche commented Oct 18, 2017

toobaz commented Oct 18, 2017

elsander commented Oct 19, 2017

elsander commented Oct 19, 2017

elsander commented Oct 19, 2017

jschendel commented Oct 19, 2017 • edited Loading

elsander commented Oct 19, 2017

jschendel commented Oct 19, 2017

Output of `pd.show_versions()`

jreback commented Oct 18, 2017 •

edited

Loading

jschendel commented Oct 19, 2017 •

edited

Loading