Skip to content

DataFrame.iat will create new column if .iat is used to set None on int Series #23236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jimmywan opened this issue Oct 19, 2018 · 5 comments · Fixed by #24495
Closed

DataFrame.iat will create new column if .iat is used to set None on int Series #23236

jimmywan opened this issue Oct 19, 2018 · 5 comments · Fixed by #24495
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jimmywan
Copy link

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iat[0, 0] = None
>>> df
   a  b   0
0  0  4 NaN
1  1  5 NaN

Problem description

This is problematic for multiple reasons.

  • inconsistency between iloc and iat.
  • creation of a brand new column is almost surely not the intended/expected behavior
  • At the very least, I would expect it to simply bail on the operation with a warning about incompatible types.

This is likely related to the non-intuitive behavior of Series which has already been documented here:
#20643 (comment)

Expected Output

I would expect it to do what it does when using .iloc:

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iloc[0, 0] = None
>>> df
     a  b
0  NaN  4
1  1.0  5

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jimmywan
Copy link
Author

I tried it with 0.23.4 and had the same output.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 19, 2018
@mroeschke
Copy link
Member

Thanks for the report. Investigations and PRs welcome!

@aditya0811
Copy link

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).
We cannot access those values using iat if column index are char type.

@mroeschke
Copy link
Member

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

IIRC iat and at doesn't perform as many data validation checks, so this may be a "fallback" assignment and broadcasting.

@jimmywan
Copy link
Author

I do not agree with @aditya0811 :

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).

As explained by @mroeschke here:

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

RoeiRaz added a commit to RoeiRaz/pandas that referenced this issue Dec 30, 2018
RoeiRaz added a commit to RoeiRaz/pandas that referenced this issue Dec 30, 2018
- in response to pandas-devgh-23236
- changes the fallback of .iat to .iloc on type error
@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants