frame and columns can get out of sync #13569

dsm054 · 2016-07-05T17:24:18Z

When using .loc to expand a column, the dataframe and the constituent Series can get out of sync. This led to a strange issue in some legacy code I inherited. Because of the copy/view issues I wouldn't have written this particular code myself, but it still cost me some time tracking down what was happening:

df = pd.DataFrame({"a": [10,20,30]})
df["a"].loc[4] = 40

gives me

In [224]: df
Out[224]: 
    a
0  10
1  20
2  30

In [225]: df["a"]
Out[225]: 
0    10
1    20
2    30
4    40
Name: a, dtype: int64

In [226]: df.shape, df["a"].shape
Out[226]: ((3, 1), (4,))

which is a little unnerving. I expected that df and df["a"] would both be unchanged or both be changed. Not sure if it's too much trouble to fix and we should just say "don't move your arm like that", though.

@jorisvandenbossche reported the same thing in master; here I'm on 0.18.0.

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0

The text was updated successfully, but these errors were encountered:

jreback · 2016-07-05T22:39:35Z

this is using the expansion of the frame. This should simply fail. Detecting that this is actually happening is a bit tricky though.

Here is how

> /Users/jreback/pandas/pandas/core/indexing.py(321)_setitem_with_indexer()
-> index = self.obj.index
(Pdb) p self.obj
0    10
1    20
2    30
Name: a, dtype: int64
(Pdb) l
316                 if missing:
317     
318                     # reindex the axis to the new value
319                     # and set inplace
320                     if self.ndim == 1:
321  ->                     index = self.obj.index
322                         new_index = index.insert(len(index), indexer)
323     
324                         # we have a coerced indexer, e.g. a float
325                         # that matches in an Int64Index, so
326                         # we will not create a duplicate index, rather
(Pdb) n
> /Users/jreback/pandas/pandas/core/indexing.py(322)_setitem_with_indexer()
-> new_index = index.insert(len(index), indexer)
(Pdb) p self.obj._cacher
('a', <weakref at 0x1124e62b8; to 'DataFrame' at 0x1123dde90>)

This is a clue that you are being cached by someone else. (and in this part of the code you know you are expanding as missing==True).

here's the actual frame

(Pdb) p self.obj._cacher[1]()
    a
0  10
1  20
2  30

I think you could then go back up to the frame and do the setitem (which will update everything appropriately, alternatively you could raise at this point).

jbrockmendel · 2020-10-15T22:46:09Z

this is fixed on master, likely a ways back

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Advanced labels Jul 5, 2016

jreback added this to the Next Major Release milestone Jul 5, 2016

jbrockmendel removed Effort Medium labels Oct 21, 2019

jbrockmendel added the Copy / view semantics label Sep 20, 2020

jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Oct 15, 2020

jorisvandenbossche added good first issue and removed Copy / view semantics labels Oct 16, 2020

mroeschke mentioned this issue May 15, 2021

TST: Add tests for old issues #41482

Merged

10 tasks

mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021

jreback closed this as completed in #41482 May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

frame and columns can get out of sync #13569

frame and columns can get out of sync #13569

dsm054 commented Jul 5, 2016

jreback commented Jul 5, 2016 •

edited

Loading

jbrockmendel commented Oct 15, 2020

frame and columns can get out of sync #13569

frame and columns can get out of sync #13569

Comments

dsm054 commented Jul 5, 2016

jreback commented Jul 5, 2016 • edited Loading

jbrockmendel commented Oct 15, 2020

jreback commented Jul 5, 2016 •

edited

Loading