Skip to content

Wrong orientation of operations between DataFrame and range() #17901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Oct 17, 2017 · 11 comments · Fixed by #17926
Closed

Wrong orientation of operations between DataFrame and range() #17901

toobaz opened this issue Oct 17, 2017 · 11 comments · Fixed by #17926
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Oct 17, 2017

Code Sample, a copy-pastable example if possible

In [3]: df = pd.DataFrame(np.ones((2,2)), columns=['A', 'B'])

In [4]: df + list(range(2))
Out[4]: 
     A    B
0  1.0  2.0
1  1.0  2.0

In [5]: df + pd.Series(range(2), index=df.columns)
Out[5]: 
     A    B
0  1.0  2.0
1  1.0  2.0

In [6]: df + range(2)
Out[6]: 
     A    B
0  1.0  1.0
1  2.0  2.0

Problem description

Operations where the operand is a range operate in the wrong direction compared to all other 1-d objects (I can think of).

Expected Output

Same as In [4]: and In [5]:.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 8a69543
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+19.g8a69543ab.dirty
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jschendel
Copy link
Member

A bit obscure, but collections.deque has the same behavior as range in this case:

In [3]: df + collections.deque([0, 1])
Out[3]:
     A    B
0  1.0  1.0
1  2.0  2.0

@toobaz
Copy link
Member Author

toobaz commented Oct 18, 2017

Yeah, I bet this happens with anything which is (conceptually) 1-d but not recognized as list-like by pandas.

@jreback
Copy link
Contributor

jreback commented Oct 18, 2017

must be a too specific path somewhere, these are both considered list-like (ala is_list_like)

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions labels Oct 18, 2017
@jorisvandenbossche
Copy link
Member

is_list_like indeed considers them as list-like (not saying that the actual code path is using is_list_like):

In [38]: pd.api.types.is_list_like(range(3))
Out[38]: True

@toobaz
Copy link
Member Author

toobaz commented Oct 18, 2017

OK, bet lost :-)

@elsander
Copy link

I'm interested in working on this bug, but I'm struggling to figure out where the relevant code is. There's no __add__ method given for the pd.DataFrame or NDFrame classes, so I suppose it's inherited?

@elsander
Copy link

Ah-- looks like the factory is in core/ops.py. I'll take a look at this.

@elsander
Copy link

Looks like the issue is that a list has an ndim attribute, but a range doesn't, which means that they are handled differently in this block from core.internals.Block.eval():

        is_transposed = False
        if hasattr(other, 'ndim') and hasattr(values, 'ndim'):
            if values.ndim != other.ndim:
                is_transposed = True
            else:
                if values.shape == other.shape[::-1]:
                    is_transposed = True
                elif values.shape[0] == other.shape[-1]:
                    is_transposed = True

One approach might be something like this:

import collections
if isinstance(other, collections.Iterable) and not hasattr(other, 'ndim'):
    other = list(other)

But I imagine this could introduce other gotchas and possibly hurt performance.

@jschendel
Copy link
Member

jschendel commented Oct 19, 2017

@elsander : I've got a fix for this that I'm in the process of finalizing, but if you'd rather work on it I'd be fine with that. From what I can tell, the root cause appears to be pandas' alignment process prior to when addition/multiplication/etc occurs:

pandas/pandas/core/ops.py

Lines 1150 to 1151 in a2e5400

def _align_method_FRAME(left, right, axis):
""" convert rhs to meet lhs dims if input is list, tuple or np.ndarray """

Specifically, the check here is a bit too narrow:

pandas/pandas/core/ops.py

Lines 1168 to 1169 in a2e5400

if isinstance(right, (list, tuple)):
right = to_series(right)

Basically I swapped this with is_list_like(right) and excluded DataFrame/Series (since is_list_like catches them, but we don't care about casting those). Also moved the check against ndarray to be before the is_list_like check since is_list_like catches ndarray but there's slightly different logic for ndarray (and nested the ndim == 0 case inside the ndarray check so it didn't hit the subsequent is_list_like check) .

Could be other workarounds though, so not trying to suggest what I did is the "best" approach.

@elsander
Copy link

@jschendel if you have a fix in progress, go for it! I didn't have a good workaround in mind.

@jschendel
Copy link
Member

@elsander : Thanks! Didn't mean to jump in and steal away the issue. I'd be happy to provide input or do code review for any other issues you want to work on (and not steal away the issue!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants