Skip to content

BUG?: _is_view == False for DataFrame from DataFrame #11814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickeubank opened this issue Dec 10, 2015 · 7 comments
Closed

BUG?: _is_view == False for DataFrame from DataFrame #11814

nickeubank opened this issue Dec 10, 2015 · 7 comments
Labels
Internals Related to non-user accessible pandas implementation Usage Question

Comments

@nickeubank
Copy link
Contributor

Constructing off a dataframe creates a de facto view, but ._is_view reports false. Seems like a bug.

In [1]:
    df = pd.DataFrame({'col1':[1,2], 'col2':[3,4]})
    df2 = pd.DataFrame(df)
    df2._is_view     
Out[1]:
    False

In [2]:
    df.loc[0,'col1'] = -88
    df2
Out[2]:
    col1    col2
    0   -88 3
    1   2   4

Suggestions on how best to fix?

@jreback
Copy link
Contributor

jreback commented Dec 11, 2015

Your example is false. A dictionary NEVER creates a view. (I have a branch that fixes this by not consolidating, but thats a separate issue).

Views are created by being passed numpy arrays (the view is to the array itself)

In [49]: df = pd.DataFrame(np.array([[1,3],[2,4]]))

In [50]: df
Out[50]: 
   0  1
0  1  3
1  2  4

In [51]: df = pd.DataFrame(np.array([[1,3],[2,4]]),columns=list('AB'))

In [52]: df
Out[52]: 
   A  B
0  1  3
1  2  4

In [53]: df._data.blocks[0].is_view
Out[53]: True

In [59]: df._data.blocks[0].values.base
Out[59]: 
array([[1, 3],
       [2, 4]])

In [54]: df = DataFrame({'A':[1,2], 'B':[3,4]})

In [55]: df
Out[55]: 
   A  B
0  1  3
1  2  4

In [56]: df._data.blocks[0].is_view
Out[56]: False

In [57]: df._data.blocks[0].values.base

In [58]: df = pd.DataFrame(np.array([[1,3],[2,4]]),columns=list('AB'))

Note that this is NOT true for object dtypes (otherwise we can have aliasing issues).

In [60]: df = DataFrame(np.array([['foo','bar'],['baz','bah']]),columns=list('AB'))

In [61]: df
Out[61]: 
     A    B
0  foo  bar
1  baz  bah

In [62]: df._data.blocks[0].values.base

In [63]: df._data.blocks[0].is_view    
Out[63]: False

@jreback jreback closed this as completed Dec 11, 2015
@jreback jreback added Usage Question Internals Related to non-user accessible pandas implementation labels Dec 11, 2015
@nickeubank
Copy link
Contributor Author

Hi @jreback

Think my example / issue was mis-understood. I agree creating a DataFrame from a dictionary never creates a view. The case I'm noting is that if you create a DataFrame from another DataFrame, then those are effectively views of one another, even if they reports _is_view == False.

In my example, if you change df, then df2 is also changed. Yet df2._is_view is False.

Indeed, view behavior documented in test modules (test_frame.py, line 2637):

def test_constructor_dtype_nocast_view(self):
    df = DataFrame([[1, 2]])
    should_be_view = DataFrame(df, dtype=df[0].dtype)
    should_be_view[0][0] = 99
    self.assertEqual(df.values[0, 0], 99)

    should_be_view = DataFrame(df.values, dtype=df[0].dtype)
    should_be_view[0][0] = 97
    self.assertEqual(df.values[0, 0], 97)

@jreback
Copy link
Contributor

jreback commented Dec 12, 2015

@nickeubank

so this is actually right.

._is_view tests whether something IS a view

e.g.

In [1]: arr = np.arange(5) 

In [2]: arr.base is None
Out[2]: True

In [3]: arr.view().base is None
Out[3]: False

but this happens to fail if the object IS THE SAME OBJECT.

E.g. in the case _data is assigned directly to the ._data in the df2. Not a view, not a copy, but the same object.

In [6]: df._data is df2._data
Out[6]: True

So this is technically 'wrong', but its more semantics than anything else. Sharing the same object or a view of it are from pandas point of view de-facto the same. numpy makes this distinction, but not pandas.

I suppose you could actually create a new block manager, where everything undering is created not with copy but with .view then this would be true. from a practical point of view I don't really know if this would make a difference though.

@jreback
Copy link
Contributor

jreback commented Dec 12, 2015

If you want to change this (independent of your other PR) and see if this actually breaks anything (except the test), I would accept a PR for that.

@nickeubank
Copy link
Contributor Author

Great! Will do -- need it for the other PR anyway. And thanks for the explanation!

@nickeubank
Copy link
Contributor Author

huh -- empty indexers do the same thing.

df = pd.DataFrame({'col1':range(10,20),
                   'col2':range(20,30)})

df2 = df.loc[:,:]

df2._data is df._data
Out[3]: True

df2._is_view
Out[4]: False

That ok to tweak too?

@manubhatt3
Copy link

PR means Personal Request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants