-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: None value in MultiIndex can't be found #59003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for raising the issue. The problem is with None value handling in In the example code,
While if the None value is inserted correctly, for example, using
, where if key is NA value, location of index unify as -1. I have submitted PR #59069 which will hopefully resolve this issue. |
@chaoyihu I came accross a similar problem but I'm not sure if I should open a new issue: Creating a pd.MultiIndex.from_tuples([(1, None), (2,3)], names=["Idx1", "Idx2"]) yields
Note the conversion from |
@KilianKW Thanks for your question!
I would recommend not opening a new issue, since similar issues such as #56366 already exist. Lookup involving NA was labeled as part of the discussion in the ice cream agreement, which afaik was a in-person agreement about distinguishing between NA and NaN reached by the developers some time last year. Cross referencing a long discussion regarding this topic: #32265, and the latest mention of that issue linking to an updated discussion about 2 weeks ago: #59122 (comment).
Yes, PR #59069 intended that both However, I am not sure if that would remain the intended behavior from the design perspective - I think that depends on how the developer team want it to be eventually.
I think in general missing values in indices should be avoided, except maybe as a temporary index in a middle step. One possible workaround I saw, as was mentioned in #56366, is to replace NA with another value. If you could provide more context, maybe I can try to find a better solution for you. |
@chaoyihu Thanks for your response! This clarifies things quite a bit. My use-case for having I have an experiment conducted under different parameters which each time yields a In some experiments, there is no upper limit, so Is there a way to directly enforce the MultiIndex |
@KilianKW I see, so you are trying to find a proper value to represent the edge case when the integer param
I would probably use mixed types in
Or, in case you would like to keep the integer dtype of
output:
|
@chaoyihu Thanks a lot for your suggestions! Using a mixed index type sounds like an interesting option. I see that the constructor of
|
Sorry, you are right, I didn't notice that I think the presence of floating point values in that level (in this case Interestingly, the type inference logic does not reside in the So the takeaway here is that the second solution I proposed in the previous reply was wrong.
I think this is a bug. Missing code logic for the I may try to work out a fix for this. If you are interested, I can keep you updated. @mroeschke The |
@chaoyihu Thanks for your effort. I'd appreciate being updated on this. |
It appears that there is a similar issue asking this question but it probably needs more discussion first what direction to take #54523 |
@KilianKW The issue mentioned by mroeschke gives a workaround, which is to initialize
|
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The dataframe has an item at the multiindex of ["one", None, "yes"], but attempting to access it fails. It appears to only happen when I use None/nan values for the middle index that I'm adding, and it only happens if I add the value using the df[] = syntax.
Expected Behavior
It should return a value when I query df["one"][None]. If I construct the dataframe with this data in it to begin with, or add it with a df.concat, it works.
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.9.2.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-513.24.1.el8_9.x86_64
Version : #1 SMP Thu Mar 14 14:20:09 EDT 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 1.23.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.0.0
pip : 23.1
Cython : None
pytest : 7.4.4
hypothesis : None
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.2.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.0
gcsfs : None
matplotlib : 3.9.0
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.12
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: