Skip to content

Columns and Index share the same numpy object underneath when pd.DataFrame.cov is used #14617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kapilsh opened this issue Nov 8, 2016 · 4 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@kapilsh
Copy link

kapilsh commented Nov 8, 2016

A small, complete example of the issue

In [1]: import pandas as pd
        import numpy as np
In [2]: df = pd.DataFrame(np.random.randn(4 * 1000).reshape(1000, 4), columns=list("abcd"))
In [3]: c = df.cov()

In [4]: c.index is c.columns
Out[4]: True

In [5]: c.index.name = "ABC"

In [6]: c.columns.name
Out[6]: 'ABC'

Expected Output

In[7]: c.index is c.columns
False

Output of pd.show_versions()

In [8]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.36.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.0
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

@kapilsh
Copy link
Author

kapilsh commented Nov 8, 2016

In my use case, I am doing something like below:

In [96]: df = pd.DataFrame({"Value": np.random.randn(1000), "Kind": map(chr, np.random.randint(65, 69, 1000))})

In [97]: df.pivot(values="Value", columns="Kind").ffill().diff().cov()
Out[97]: 
Kind             A             B             C             D
Kind                                                        
A     6.094439e-01  1.864854e-06 -5.956038e-07 -1.130525e-08
B     1.864854e-06  5.643768e-01  1.384354e-06  2.627663e-08
C    -5.956038e-07  1.384354e-06  4.964671e-01 -1.802524e-08
D    -1.130525e-08  2.627663e-08 -1.802524e-08  3.862837e-01

In [98]: cc = df.pivot(values="Value", columns="Kind").ffill().diff().cov()

In [99]: cc.index is cc.columns
Out[99]: True

As a result,

cc.unstack().reset_index()

fails.

@jreback
Copy link
Contributor

jreback commented Nov 8, 2016

yeah it should shallow copy the index first rather than setting the same object so that meta data will not be shared

want to do a PR ?

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 8, 2016
@jreback jreback added this to the Next Major Release milestone Nov 8, 2016
@kapilsh
Copy link
Author

kapilsh commented Nov 8, 2016

Sure! I can do a PR. Feel free to assign it to me.

@kapilsh
Copy link
Author

kapilsh commented Nov 16, 2016

@jreback Made the changes to cov and corr.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, Next Major Release Feb 28, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
…ndas-dev#14617)

closes pandas-dev#14617

Author: Matt Roeschke <[email protected]>

Closes pandas-dev#15528 from mroeschke/fix_14617 and squashes the following commits:

5a46f0a [Matt Roeschke] Bug:DataFrame index & column returned by corr & cov are the same (pandas-dev#14617)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
3 participants