-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Issue with Zero's Broadcasting down a column in Heterogeneous data columns ... #13758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you are expecting the sum of objects and floats to actually work, but it doesn't; pandas catches the error and ignores the column. So these results are all as expected you are simply getting back the
operations between
you probably want o use
|
Thanks for your response @jreback. I see your reasoning with .to_numeric() -- good call. But focusing on the main case. Using the first example import pandas as pd
import numpy as np
data = {
'One' : pd.Series(['A', 1.2, np.nan]),
'Two' : pd.Series([1.4, 3.2, 4.5])
}
df = pd.DataFrame(data) If a column meets the condition above then it get's replaced with a column of 0's removing valid data and turning it to a numeric. df[['One']].sum(axis=1) will produce a column of |
not sure where that's coming from, I think something (bottleneck, numpy) is trying to be helpful here. ok i'll reopen, but pls edit down the top just to include that example. It is too much otherwise. |
and pls don't include images at all, just copy-paste text (format using markdown). the more helpful to readers (and the shorter), the more likely someone will look at this. |
Thanks @jreback. Pandas is great! Original question has been edited. |
Looks to be fixed on master. Could use a test.
|
I had a recent use case which produced some unexpected output and I have isolated the problem down to a much simpler case to replicate the behavior.
Please see simplified code example below.
Code Sample, a copy-pastable example if possible
Doing the sum across the first column (which in this case is redundant). [Note: In my case I was doing groupby operations where some columns were being summed and others were single columns as indexed by a level of a MultiIndex. The issue was that some of the columns ended up being full of 0's)
produces an unexpected outcome:
I think this might be an error and it produces a valid column of 0's that is of
dtype=float64
.Expected Output
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: