Skip to content

Unexpected result when setting a row by a dict #16724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Yevgnen opened this issue Jun 19, 2017 · 11 comments
Closed

Unexpected result when setting a row by a dict #16724

Yevgnen opened this issue Jun 19, 2017 · 11 comments
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@Yevgnen
Copy link

Yevgnen commented Jun 19, 2017

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd


persons = [
    {
        'name': ''.join([
            np.random.choice([chr(x) for x in range(97, 97 + 26)])
            for i in range(5)
        ]),
        'age': np.random.randint(0, 100),
        'sex': np.random.choice(['male', 'female']),
        'job': np.random.choice(['staff', 'cook', 'student']),
        'birthday': np.random.choice(pd.date_range('1990-01-01', '2010-01-01')),
        'hobby': np.random.choice(['cs', 'war3', 'dota'])
    }
    for i in range(10)
]

df = pd.DataFrame(persons)
df.set_index('birthday', inplace=True)
print(df)
df.iloc[0] = {
    'name': 'john',
    'age': int(10),
    'sex': 'male',
    'hobby': 'nohobby',
    'job': 'haha'
}
print(df)

Problem description

             age hobby      job   name     sex
birthday
2007-12-31  name   age      sex  hobby     job
2004-07-31    20  dota  student  uwxhn  female
2001-10-22    34  war3     cook  udknv  female
2002-10-13    91  dota     cook  bofcv  female
1992-05-25    54  war3     cook  tcqew    male
2009-09-02    95  war3    staff  jcolr  female
1998-12-15    61  war3  student  dibkw  female
2004-07-03     4  war3  student  mntqh    male
2000-06-08    88  war3    staff  jknxm  female
2006-10-19    82    cs  student  asrpz    male

Have no idea why the keys are set to the rows.

Expected Output

             age hobby      job   name     sex
birthday  
2007-12-31    10 nohobby  haha john male
2004-07-31    20  dota  student  uwxhn  female
2001-10-22    34  war3     cook  udknv  female
2002-10-13    91  dota     cook  bofcv  female
1992-05-25    54  war3     cook  tcqew    male
2009-09-02    95  war3    staff  jcolr  female
1998-12-15    61  war3  student  dibkw  female
2004-07-03     4  war3  student  mntqh    male
2000-06-08    88  war3    staff  jknxm  female
2006-10-19    82    cs  student  asrpz    male

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Hmm, all our setting code deals with iterables, which means the .keys() for a dictionary. The only caveat is that Series and DataFrames are aligned before setting.

I'll let Jeff weigh in, but I don't think we'll want to support this. If you want that behavior, you'll need to pass the dictionary to a pd.Series first.

@TomAugspurger TomAugspurger added the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 19, 2017
@Yevgnen
Copy link
Author

Yevgnen commented Jun 19, 2017

Thanks for your reply. I think if this will not be supported, I think the example in the document should be removed for reducing confusion.😅

@TomAugspurger
Copy link
Contributor

Your example probably should work then, my mistake. It seems like we don't probably set when there are multiple dtypes?

# two dtypes
In [44]: x = pd.DataFrame({'x': [1, 2, 3], 'y': ['3', '4', '5']})

In [46]: x.iloc[1] = {'x': 9, 'y': '99'}

In [47]: x  # set incorrectly with the keys
Out[47]:
   x  y
0  1  3
1  x  y
2  3  5

# single dtype
In [48]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})

In [49]: x.iloc[1] = {'x': 9, 'y': 99}

In [50]: x  # sets correctly
Out[50]:
   x   y
0  1   3
1  9  99
2  3   5

@Yevgnen
Copy link
Author

Yevgnen commented Jun 19, 2017

Probably. :-)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 19, 2017

The ability to do this is certainly deliberate (I mean assigning with a dict, not the fact that it uses the keys instead of values), although I am not sure I am too happy about that :-) (way too many corner cases that come up by allowing this, eg #10219)

I thought there has been some related discussion about this recently, but only find #16309,

(side note: it's would maybe be worth opening a discussion whether we would want to deprecate this?)

@jreback
Copy link
Contributor

jreback commented Jun 19, 2017

I don't think this is inconsistent with setting with a Series, in fact this does work when setting with a Series. I suppose this should work. Special casing things is not generally a good thing.

@oguzhanogreden
Copy link
Contributor

oguzhanogreden commented Nov 5, 2019

FYI, I'm planning to give this a go soon.

@oguzhanogreden
Copy link
Contributor

oguzhanogreden commented Nov 17, 2019

This fails since dict is considered a list-like and the reported cases is handled under the assumption that it's a list, here.

I find the follow_split_path==True path and the conditionals here quite hard to follow. If we can assume that _can_do_equal_len will return False for the case described here (and variations of it), the solution is simply distinguishing a dict from a list-like in the else case.

I found some discussions here but couldn't crack it yet.

@jorisvandenbossche
Copy link
Member

Duplicate report: #29917

@mroeschke
Copy link
Member

This looks to work on master. Could use a test

In [1]: In [48]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})
   ...:
   ...: In [49]: x.iloc[1] = {'x': 9, 'y': 99}

In [2]: x
Out[2]:
   x   y
0  1   3
1  9  99
2  3   5

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 12, 2021
@mroeschke
Copy link
Member

Looks like there's a test for this issue: test_iloc_setitem_dictionary_value. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests

7 participants