Skip to content

Modifying DataFrame first row produces wrong result #29917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
t6s4 opened this issue Nov 28, 2019 · 3 comments
Closed

Modifying DataFrame first row produces wrong result #29917

t6s4 opened this issue Nov 28, 2019 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@t6s4
Copy link

t6s4 commented Nov 28, 2019

# Reproducing faulty behavior:
dfs = pd.DataFrame([[1,'a1',31],[2,'a2',32],[3,'a3',33]], index=['a','b','c'], columns=['c1','c2','c3'])
dfs.iloc[0] = {'c1':101, 'c2':'A1', 'c3':111}
assert dfs.iloc[0][0] == 101     #FAILED 
assert dfs.iloc[0][1] == 'A1'     #FAILED
assert dfs.iloc[0][2] == 111     #FAILED

Problem description

Updating first row duplicates column labels.

@jorisvandenbossche
Copy link
Member

Thanks for the report!

According to our docs this should indeed work (https://dev.pandas.io/docs/user_guide/indexing.html#attribute-access, see the "You can also assign a dict to a row of a DataFrame:" a bit below the linked title). The example from there:

In [24]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})

In [25]: x.iloc[1] = {'x': 9, 'y': 99}

In [26]: x
Out[26]: 
   x   y
0  1   3
1  9  99
2  3   5

Now, in practice, this only seems to work like that if your dataframe has uniform types. Eg if I change the second column to float in the above example, it no longer works:

In [63]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [0.3, 0.4, 0.5]})  

In [64]: x.iloc[1] = {'x': 9, 'y': 99}                         

In [65]: x   
Out[65]: 
   x    y
0  1  0.3
1  x    y
2  3  0.5

Now, it seems to already been this way for a very long time (tested 0.21 and 0.18, which show the same), so it is certainly not a recent regression (if it ever worked).

BTW, if you want to have this work reliably, you can simply wrap the dict into a Series:

In [66]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [0.3, 0.4, 0.5]}) 

In [67]: x.iloc[1] = pd.Series({'x': 9, 'y': 99}) 

In [68]: x 
Out[68]: 
   x     y
0  1   0.3
1  9  99.0
2  3   0.5

@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action labels Nov 28, 2019
@jorisvandenbossche
Copy link
Member

See #16724 for a similar issue (and we came to the same conclusion about the dtypes then :)).
Closing this as a duplicate.

@jorisvandenbossche
Copy link
Member

Duplicate of #16724

@jorisvandenbossche jorisvandenbossche marked this as a duplicate of #16724 Nov 28, 2019
@jorisvandenbossche jorisvandenbossche added Duplicate Report Duplicate issue or pull request and removed Needs Discussion Requires discussion from core team before further action labels Nov 28, 2019
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants