-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Binary operators between DataFrame and Series object doesn't seem to work #5284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your example is hard to parse because wakari escapes html and JS. Do you mind posting your example somewhere else or making it readable (maybe nbviewer.ipython.org would work?) |
My apologies, I didn't know Wakari does stuff like this. Here's an nbviewer link: http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/52886258/000-qdoqud/Untitled0.ipynb |
@liori thanks for re-posting that.
|
For reference, in R this sort of 'works as you expect':
Whereas in pandas it gives very strange errors (this is 0.12.0) n = [2, 3, 5]
s = ['aa', 'bb', 'cc']
b = [True, False, True]
df = pandas.DataFrame({'n': n, 's': s, 'b': b})
df * df['n'] # TypeError: Could not operate [array([ nan])] with block values [too many boolean indices]
df + n # TypeError: Could not operate [array([5], dtype=int64)] with block values [too many boolean indices]
df * n # works
b n s
0 2 6 aaaaaaaaaa
1 0 9 bbbbbbbbbb
2 2 15 cccccccccc And with the original example, it feels weird that doing arithmetic with a selected column still results in the garbled output: |
|
you need to use mul/add which provide for alignment - that's what they r for |
@jreback, there's no equivalent for |
There is in 0.13 - it's called |
@liori to be honest your example is aligning correctly, but since DataFrame and series align across columns this is correct you can use add/mul to force and explicitly alignment if you wish, but keep in mind that what you are expecting is not natural |
@jreback any way we could improve the error messages to advise using the arithmetic flex methods? Maybe we could also warn when you're going to get something like this (since this is probably never what you want). I'm thinking specifically when you union a Series index with DataFrame columns:
e.g. I also find it confusing that you can't actually do arithmetic with the whole dataframe when you select out a column. |
So, to be clear, this broadcasts:
|
yep could use a better errors msg - but can't be right all the time; imagine a df with index and columns of 1-4 then it's ambiguous but most of the time if their is a length/index type mismatch is an incorrect alignment |
I agree, there are certainly ambiguous cases. But we could warn whenever you have the case of Series + DataFrame with no elements overlapping between columns and Series index. I think that would've headed that off. If you're playing around with pandas / have loaded from some IO source, I'd assume that your columns will be string-like and index will be integer-like (or at least different than cols) so it would cover majority of cases. |
@jreback: I just wanted to reuse my knowledge of R dataframes in pandas; especially given that pandas is described as a library bringing data analysis workflow from “languages like R” to Python. But if |
and I guess you'd want to say this:
|
@liori I have little experience with R. How would you broadcast along rows rather than along columns and vice-versa? There's a few sections on comparisons with R, probably would be helpful to add that. (and at least in 0.13 you get relatively comprehensible |
@jtratner: It can be done using an
BTW, note that in R a |
Thanks - that's helpful! I'll try to put some more comparisons together so |
Related use case: http://stackoverflow.com/q/21627926/190597 Broadcasting equality testing between DataFrame and Series
replaced with
|
@unutbu this a bit tricky.... you will want to emulate something like this:
what you want to create is a set of functions, exaclty like that are called
you just tneed to add the functions that's it (plus tests of course)! lmk |
@jreback: Okay, I'll give it a go... |
@jreback: Am I missing something, or does
|
hmm maybe just need to make and/ or be the same as those methods then |
Python syntax prevents |
@unutbu I read above that I think |
@jreback: If I understand correctly, By the way, I'm still working on fixing the nan-sort PR; its failing nosetests after rebasing... |
So this is causing a warning here: https://github.com/pydata/pandas/blob/master/pandas/tests/series/test_operators.py#L1200 because and the alignment should be |
related similar operation
http://stackoverflow.com/questions/19484344/how-do-i-use-a-specific-columns-value-in-a-pandas-dataframe-where-clause/19494873#19494873
http://stackoverflow.com/questions/19507088/filtering-a-pandas-dataframe-without-removing-rows/19516869#19516869
http://stackoverflow.com/q/21627926/190597
This should be a bit more intuitive
Given that normal binary operators like addition or logical
and
work well between a pair of Series objects, or between a pair of DataFrame objects (returning a element-wise addition/conjuction), I found it surprising that I cannot do the same between a Series object and a DataFrame object.Here's a demonstration of what doesn't work now and what would be the expected result: http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/52886258/000-qdoqud/Untitled0.ipynb
The text was updated successfully, but these errors were encountered: