Updating a DataFrame by iteratively indexing into a columns #7084

Jenders74 · 2014-05-09T00:50:09Z

I am initializing a DataFrame with 0 and then update it by iteratively indexing into indvidual columns. The behavior of my code has changed with pandas 0.13.0 such that resulting DataFrame out[['A']] remains 0 but series out['A'] has the correct values:

>>> print out[['A']]
            A
2014-05-07  0
2014-05-08  0
2014-05-09  0

 >>> print out['A']
2014-05-07    600
2014-05-08    600
2014-05-09    600

Is this a bug?

import pandas
#initialize a DataFrame with 0 values
out = pandas.DataFrame({'A': [0, 0, 0]})
out.index = pandas.date_range('5/7/2014', '5/9/2014')

#DataFrame to update out with
df = pandas.DataFrame({'C': ['A', 'A', 'A'], 'D': [100, 200, 300]})

#loop through df to update out
for ix, row in df.iterrows():
    six = pandas.Timestamp('5/7/2014')
    eix = pandas.Timestamp('5/9/2014')
    out[row['C']][six:eix] = out[row['C']][six:eix] + row['D']

print out
print out[['A']]
print out['A']

INSTALLED VERSIONS
------------------
Python: 2.7.6.final.0
OS: Windows
Release: 7
Processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.13.0
Cython: 0.20
Numpy: 1.7.1
Scipy: 0.12.0
statsmodels: 0.5.0
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2013.9
bottleneck: 0.7.0
PyTables: 3.1.0rc2
    numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.8.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: 4.3.2
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

The text was updated successfully, but these errors were encountered:

cpcloud · 2014-05-09T02:42:29Z

This kind of looks like a regression. I say "kind of", because you should avoid this kind of indexing like the plague but also because we like to keep backcompat and breaking this is kind against our de facto rules. I'll call this a bug.

cpcloud · 2014-05-09T02:43:41Z

For future reference, instead of

out[row['C']][six:eix] = out[row['C']][six:eix] + row['D']

do

out.loc[six:eix, row['C']] += row['D']

jreback · 2014-05-09T03:36:59Z

this is chained indexing and hence is not guaranteed to work
it happens to work sometimes
don't do this

not a bug

jorisvandenbossche · 2014-05-09T06:53:53Z

@jreback just curious, but how is it possible that out['A'] and out[['A']] give different results?

jtratner · 2014-05-09T06:59:09Z

Isn't this a caching bug?

Jenders74 · 2014-05-09T13:09:21Z

@cpcloud nice one! thanks a ton.

jreback · 2014-05-09T13:26:21Z

@Jenders74 have a read here, you should never use chain indexing

http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy

that said this was a bug, thanks for reporting

closed by #7087

jreback closed this as completed May 9, 2014

jtratner reopened this May 9, 2014

jreback mentioned this issue May 9, 2014

BUG: cache coherence issue with chain indexing and setitem (GH7084) #7087

Merged

jreback added Bug labels May 9, 2014

jreback added this to the 0.14.0 milestone May 9, 2014

jreback closed this as completed in #7087 May 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating a DataFrame by iteratively indexing into a columns #7084

Updating a DataFrame by iteratively indexing into a columns #7084

Jenders74 commented May 9, 2014

cpcloud commented May 9, 2014

cpcloud commented May 9, 2014

jreback commented May 9, 2014

jorisvandenbossche commented May 9, 2014

jtratner commented May 9, 2014

Jenders74 commented May 9, 2014

jreback commented May 9, 2014

Updating a DataFrame by iteratively indexing into a columns #7084

Updating a DataFrame by iteratively indexing into a columns #7084

Comments

Jenders74 commented May 9, 2014

cpcloud commented May 9, 2014

cpcloud commented May 9, 2014

jreback commented May 9, 2014

jorisvandenbossche commented May 9, 2014

jtratner commented May 9, 2014

Jenders74 commented May 9, 2014

jreback commented May 9, 2014