Skip to content

pd.DataFrame.dot does not work when column index is set #26480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jubebo opened this issue May 21, 2019 · 3 comments · Fixed by #26496
Closed

pd.DataFrame.dot does not work when column index is set #26480

jubebo opened this issue May 21, 2019 · 3 comments · Fixed by #26496

Comments

@jubebo
Copy link

jubebo commented May 21, 2019

From the documentation we see that the following code is just fine:

df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
s = pd.Series([1, 1, 2, 1])
df.dot(s)

and will return

pd.Series([-4,5])

Inconsistently, the same code fails, when one adds column labels to the first dataframe like such

df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]], columns=['1','2', '3', '4'])

Expected Output

As seen above

pd.Series([-4,5])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.16
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jschendel
Copy link
Member

This is the expected behavior. The key thing here is that df.dot(s) is aligning the columns of df with the index of s. In your second example the alignment fails, as the columns of df do not match the index of s, so raising the ValueError is correct.

The advantage of alignment is that you don't need to worry about making sure things are ordered consistently, and you'll get the same result regardless of ordering:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+596.g20d0ad159a'

In [2]: df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
   ...: s = pd.Series([1, 1, 2, 1])

In [3]: s2 = s.sample(frac=1)

In [4]: s2
Out[4]:
3    1
0    1
1    1
2    2
dtype: int64

In [5]: df.dot(s)
Out[5]:
0   -4
1    5
dtype: int64

In [6]: df.dot(s2)
Out[6]:
0   -4
1    5
dtype: int64

Note that you can convert s to an array to simply align positionally:

In [7]: df.dot(s2.values)
Out[7]:
0   -3
1    5
dtype: int64

This behavior could probably be better documented though, so updates to the documentation would be accepted and appreciated.

@jschendel jschendel added this to the Contributions Welcome milestone May 22, 2019
@jubebo
Copy link
Author

jubebo commented May 22, 2019

Thank you very much for your quick and extensive answer!

@matsmaiwald
Copy link
Contributor

I'll update the documentation if no one else is already on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants