Skip to content

Updating a DataFrame by iteratively indexing into a columns #7084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jenders74 opened this issue May 9, 2014 · 7 comments · Fixed by #7087
Closed

Updating a DataFrame by iteratively indexing into a columns #7084

Jenders74 opened this issue May 9, 2014 · 7 comments · Fixed by #7087
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@Jenders74
Copy link

I am initializing a DataFrame with 0 and then update it by iteratively indexing into indvidual columns. The behavior of my code has changed with pandas 0.13.0 such that resulting DataFrame out[['A']] remains 0 but series out['A'] has the correct values:

>>> print out[['A']]
            A
2014-05-07  0
2014-05-08  0
2014-05-09  0

 >>> print out['A']
2014-05-07    600
2014-05-08    600
2014-05-09    600

Is this a bug?

import pandas
#initialize a DataFrame with 0 values
out = pandas.DataFrame({'A': [0, 0, 0]})
out.index = pandas.date_range('5/7/2014', '5/9/2014')

#DataFrame to update out with
df = pandas.DataFrame({'C': ['A', 'A', 'A'], 'D': [100, 200, 300]})

#loop through df to update out
for ix, row in df.iterrows():
    six = pandas.Timestamp('5/7/2014')
    eix = pandas.Timestamp('5/9/2014')
    out[row['C']][six:eix] = out[row['C']][six:eix] + row['D']

print out
print out[['A']]
print out['A']
INSTALLED VERSIONS
------------------
Python: 2.7.6.final.0
OS: Windows
Release: 7
Processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.13.0
Cython: 0.20
Numpy: 1.7.1
Scipy: 0.12.0
statsmodels: 0.5.0
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2013.9
bottleneck: 0.7.0
PyTables: 3.1.0rc2
    numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.8.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: 4.3.2
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed
@cpcloud
Copy link
Member

cpcloud commented May 9, 2014

This kind of looks like a regression. I say "kind of", because you should avoid this kind of indexing like the plague but also because we like to keep backcompat and breaking this is kind against our de facto rules. I'll call this a bug.

@cpcloud
Copy link
Member

cpcloud commented May 9, 2014

For future reference, instead of

out[row['C']][six:eix] = out[row['C']][six:eix] + row['D']

do

out.loc[six:eix, row['C']] += row['D']

@jreback
Copy link
Contributor

jreback commented May 9, 2014

this is chained indexing and hence is not guaranteed to work
it happens to work sometimes
don't do this

not a bug

@jreback jreback closed this as completed May 9, 2014
@jorisvandenbossche
Copy link
Member

@jreback just curious, but how is it possible that out['A'] and out[['A']] give different results?

@jtratner
Copy link
Contributor

jtratner commented May 9, 2014

Isn't this a caching bug?

@jtratner jtratner reopened this May 9, 2014
@Jenders74
Copy link
Author

@cpcloud nice one! thanks a ton.

@jreback
Copy link
Contributor

jreback commented May 9, 2014

@Jenders74 have a read here, you should never use chain indexing

http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy

that said this was a bug, thanks for reporting

closed by #7087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants