Skip to content

Modifying dataframe with float values using .iloc and boolean indexing raises ValueError #20627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karih opened this issue Apr 7, 2018 · 4 comments · Fixed by #34622
Closed
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@karih
Copy link

karih commented Apr 7, 2018

There seems to be inconsistent behavior between updating parts of a dataframe using .iloc depending on whether the index is accessed using an integer list or a boolean array. This does not seem to happen if all dataframe values are integer (but still happens if np.nan is replaced by 7.0). Example:

Code Sample

import numpy as np

data = [
    [0, 2, 13], 
    [1, 6, 21], 
    [2, np.nan, 0],
    [3, 5, 3]
]
df = pd.DataFrame(data, columns=("idx", "val1", "val2")).set_index("idx")

print("updating with ints")
df.iloc[np.nonzero(df.index >= 2)[0],0] *= 2
print("updating with bool")
try:
    df.iloc[df.index >= 2,0] *= 2 # raises ValueError
except ValueError as e:
    print("ValueError", e)
try:
    df.iloc[df.index < 2,0] *= 2 # raises ValueError
except ValueError as e:
    print("ValueError", e)
print("iloc with bool")
print(df.iloc[df.index >= 2,0])
print("iloc with ints")
print(df.iloc[np.nonzero(df.index >= 2)[0],0])

Output is:

updating with ints
updating with bool
ValueError Must have equal len keys and value when setting with an iterable
ValueError Must have equal len keys and value when setting with an iterable
iloc with bool
idx
2     NaN
3    10.0
Name: val1, dtype: float64
iloc with ints
idx
2     NaN
3    10.0
Name: val1, dtype: float64

Problem Description

Anticipated behavior is that they should be equivalent when updating, particuarly when one considers that the return value of .iloc is the same. Instead, we receieve ValueError: Must have equal len keys and value when setting with an iterable.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.15-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 38.5.2
Cython: 0.25.2
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Apr 9, 2018

what exactly is not right here? can you show a result= and expected= that you think should be the case here. ideally write a test that should pass

@karih
Copy link
Author

karih commented Apr 9, 2018

The code raises a ValueError, which I'm thinking it shouldn't. At least one would expect the same behavior from
df.iloc[np.nonzero(df.index >= 2)[0],0] *= 2
and
df.iloc[df.index >= 2,0] *= 2

@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 13, 2019
@simonjayhawkins
Copy link
Member

this no longer raises ValueError: Must have equal len keys and value when setting with an iterable on master.

@simonjayhawkins simonjayhawkins added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 30, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Mar 30, 2020
@simonjayhawkins
Copy link
Member

this no longer raises ValueError: Must have equal len keys and value when setting with an iterable on master.

this was fixed in #31897 (to be released in 1.1, so could also add whatsnew)

3da053c is the first new commit
commit 3da053c
Author: jbrockmendel [email protected]
Date: Mon Feb 17 16:15:45 2020 -0800

BUG: fix length_of_indexer with boolean mask (#31897)

OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 6, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 7, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 8, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jun 9, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 10, 2020
mroeschke pushed a commit that referenced this issue Jun 13, 2020
* added a test for issue #20627

* added a test for issue #20627 (review taken into account)

* TST: boolean indexing using .iloc #20627

* TST #20627 added tests, and take review into account

* Updated test_iloc.py to pass CI testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants