-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame._init_dict handles columns with nan incorrectly if columns passed separately #16894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can actually replicate this using >>> import numpy as np
>>> s = {np.nan : 2}
>>> np.nan in s
True
>>>
>>> s = {np.float64(np.nan) : 2}
>>> np.nan in s
False Honestly, I blame The obvious candidate workaround is to write |
>>> np.nan in {np.nan}
True
>>> np.float64(np.nan) in {np.float64(np.nan)}
False
>>> float('nan') in {float('nan')}
False
>>> np.nan is np.nan
True
>>> np.float64(np.nan) is np.float64(np.nan)
False This bug came up in #16883 (comment). But there are legitimate cases for a (catch-all) nan in index (e.g. #3729). |
@kernc : Hmmm...I suspect we have the support for this indexing somewhere in the code-base, as this code works below: >>> df = DataFrame({np.float64(np.nan): [1, 2]})
>>> df[np.nan]
2
>>> type(df.columns[0])
<class 'numpy.float64'> I can't search the code-base from my phone, but I suspect if you walk through the |
Yes, |
Correct, but I'm saying that the logic for |
Okay, might it be possible to use similar logic when constructing from |
in
so must be missing this step somewhere. |
@jreback : confused by what you were just demonstrating. |
This problem is already solved for several cases (but not the one that illustrated above) and SDF. So the existing solutions should just propagate. |
Ah, gotcha, you were referring to what I was saying above |
Looks to work on master. Could use a test:
|
Code Sample, a copy-pastable example if possible
Problem description
When DataFrame is initialized from dict, if columns are passed, nan isn't recognized and retrieved from dict correctly. The problem is in loops like:
If
columns
aren't passed separately, initialization works as expected.Consistentcy would be nice.
Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: