Skip to content

frame and columns can get out of sync #13569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsm054 opened this issue Jul 5, 2016 · 2 comments · Fixed by #41482
Closed

frame and columns can get out of sync #13569

dsm054 opened this issue Jul 5, 2016 · 2 comments · Fixed by #41482
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@dsm054
Copy link
Contributor

dsm054 commented Jul 5, 2016

When using .loc to expand a column, the dataframe and the constituent Series can get out of sync. This led to a strange issue in some legacy code I inherited. Because of the copy/view issues I wouldn't have written this particular code myself, but it still cost me some time tracking down what was happening:

df = pd.DataFrame({"a": [10,20,30]})
df["a"].loc[4] = 40

gives me

In [224]: df
Out[224]: 
    a
0  10
1  20
2  30

In [225]: df["a"]
Out[225]: 
0    10
1    20
2    30
4    40
Name: a, dtype: int64

In [226]: df.shape, df["a"].shape
Out[226]: ((3, 1), (4,))

which is a little unnerving. I expected that df and df["a"] would both be unchanged or both be changed. Not sure if it's too much trouble to fix and we should just say "don't move your arm like that", though.

@jorisvandenbossche reported the same thing in master; here I'm on 0.18.0.

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
@jreback
Copy link
Contributor

jreback commented Jul 5, 2016

this is using the expansion of the frame. This should simply fail. Detecting that this is actually happening is a bit tricky though.

Here is how

> /Users/jreback/pandas/pandas/core/indexing.py(321)_setitem_with_indexer()
-> index = self.obj.index
(Pdb) p self.obj
0    10
1    20
2    30
Name: a, dtype: int64
(Pdb) l
316                 if missing:
317     
318                     # reindex the axis to the new value
319                     # and set inplace
320                     if self.ndim == 1:
321  ->                     index = self.obj.index
322                         new_index = index.insert(len(index), indexer)
323     
324                         # we have a coerced indexer, e.g. a float
325                         # that matches in an Int64Index, so
326                         # we will not create a duplicate index, rather
(Pdb) n
> /Users/jreback/pandas/pandas/core/indexing.py(322)_setitem_with_indexer()
-> new_index = index.insert(len(index), indexer)
(Pdb) p self.obj._cacher
('a', <weakref at 0x1124e62b8; to 'DataFrame' at 0x1123dde90>)

This is a clue that you are being cached by someone else. (and in this part of the code you know you are expanding as missing==True).

here's the actual frame

(Pdb) p self.obj._cacher[1]()
    a
0  10
1  20
2  30

I think you could then go back up to the frame and do the setitem (which will update everything appropriately, alternatively you could raise at this point).

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Advanced labels Jul 5, 2016
@jreback jreback added this to the Next Major Release milestone Jul 5, 2016
@jbrockmendel
Copy link
Member

this is fixed on master, likely a ways back

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Oct 15, 2020
@mroeschke mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants