You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>> import numpy as np
>>> import pandas as pd
>>> a = np.array([1,2,3])
>>> c = pd.DataFrame({0: ["a", "b", "c"]}, index=a)
>>> print(c)
0
1 a
2 b
3 c
>>> d = pd.DataFrame({1: ["d", "e", "f"]}, index=a)
>>> print(d)
1
1 d
2 e
3 f
>>> cd_df = pd.concat([c,d], axis=1, copy=True)
>>> a[0] = 9
>>> cd_df # Changed
0 1
9 a d
2 b e
3 c f
When indexes are the same, concat with axis=1, and copy=True doesn't copy the index.
Originally discovered in #37441 (index was not copied, and was a view pointing to the original ndarray read by PyTables, preventing it from being freed).
The text was updated successfully, but these errors were encountered:
Indices should be thought of as immutable, no? Much of the functionality (e.g. cached properties) depends on this I think. Should modifying _data really be supported?
Hmm. This probably wasn't the best example.
I still think we should do the copy, though.
Here's one that doesn't abuse internals
>>> import numpy as np
>>> import pandas as pd
>>> a = np.array([1,2,3])
>>> c = pd.DataFrame({0: ["a", "b", "c"]}, index=a)
>>> print(c)
0
1 a
2 b
3 c
>>> d = pd.DataFrame({1: ["d", "e", "f"]}, index=a)
>>> print(d)
1
1 d
2 e
3 f
>>> cd_df = pd.concat([c,d], axis=1, copy=True)
>>> a[0] = 9
>>> cd_df # Changed
0 1
9 a d
2 b e
3 c f
It's a question of performance vs safety. When pandas creates an Index, it doesn't copy memory unnecessarily. This can lead to an index changing even without concat, and this can lead to some bad behavior.
a = np.array([1, 2, 3])
c = pd.DataFrame({0: ["a", "b", "c"]}, index=a)
print(c.index.is_monotonic_increasing)
# True
a[0] = 9
print(c)
# 0
# 9 a
# 2 b
# 3 c
print(c.index.is_monotonic_increasing)
# True
So to me, this is more about whether a copy should happen on Index construction. We can only guarantee immutability (assuming no access to protected internals) if a copy is made, but this could be inefficient.
We make a pretty blanket assumption throughout pandas that an Index is immutable, and assuming that to be the case I think concat should not make a copy. But perhaps construction should.
Example
When indexes are the same, concat with axis=1, and copy=True doesn't copy the index.
Originally discovered in #37441 (index was not copied, and was a view pointing to the original ndarray read by PyTables, preventing it from being freed).
The text was updated successfully, but these errors were encountered: