-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Cannot index into DataFrame with Nullable Integer as index #34497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that it's actually the presence of duplicates rather than more than one NA value that causes this to fail (it's the uniqueness in the second case that makes it work). This for example also raises: df = pd.DataFrame(
[1, 2, 3],
index=pd.Index([1, 1, pd.NA]),
)
df.loc[1] It seems the two cases follow separate code paths in pandas/_libs/index.pyx. |
That's actually not accurate; in the current version of pandas, there is no issue with duplicate entries in the index list. Without the presence of pd.NA, this actually works: df = pd.DataFrame(
[1, 2, 3],
index=pd.Index([1, 1, 2]),
)
df.loc[1] producing:
I believe it has to do with the non-equality nature of different NaNs, and how masking with NA is handled, which is discussed in #31503. |
@elinor-lev what Daniel meant is that it is the fact that there is a duplicate (whether the NA or another number) is what is triggering it, as without duplicates but with pd.NA, it works (but it is of course still the presence of pd.NA that triggers the bug):
As indeed, whether there is any duplicate or not triggers a different code path. There are still several problems with having @elinor-lev and thanks for the report! |
Yes, I think that is correct. The error happens in this function: Lines 124 to 129 in 48c00ea
Where
|
Code Sample
Problem description
When there is more than one missing value in the dataframe index, indexing into the dataframe results in
If only one NaN index exists, this works:
Expected Output
I Expect
df2.loc[1]
to returnOutput of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-53-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.2.1
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.1.2
numexpr : 2.6.4
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.4.2
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: