Skip to content

BUG: bug in construction using a tuple indexer with embedded None #12948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleabeauchamp opened this issue Apr 21, 2016 · 5 comments · Fixed by #31161
Closed

BUG: bug in construction using a tuple indexer with embedded None #12948

kyleabeauchamp opened this issue Apr 21, 2016 · 5 comments · Fixed by #31161
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@kyleabeauchamp
Copy link

kyleabeauchamp commented Apr 21, 2016

In the following example, I was surprised that the None in the index somehow converts the value 1.0 to be stored as a nan in the values. Feel free to close if this is obviously user error :)

Code Sample, a copy-pastable example if possible

import pandas as pd

d = {(1, 1, None):-1.0, (1, 1, 1):-2.0, (1, 1, 2):-3.0}
x = pd.Series(d)
x
#### Naive Expected Output
1  1  None    -1.0
      1     -2.0
      2     -3.0
dtype: float64

#### Actual observed output: 

1  1  NaN    NaN
      1     -2.0
      2     -3.0
dtype: float64
#### output of ``pd.show_versions()``
pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.1
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

None is always converted to a missing value indicator of np.nan (except in some very limited circumstances) as None is a python object and cannot be efficiently coded internally.

see: http://pandas.pydata.org/pandas-docs/stable/missing_data.html#values-considered-missing

@jreback jreback closed this as completed Apr 21, 2016
@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question labels Apr 21, 2016
@kyleabeauchamp
Copy link
Author

I actually know that part already for storing the index. What was confusing to me was that the presence of a missing value in the index tuple also converted the corresponding value from -1.0 to nan.

Is there somewhere that documents this behavior? E.g. "a missing value in a MultiIndex tuple forces any Series value associated with this index tuple to be missing (nan)".

@kyleabeauchamp
Copy link
Author

d = {}
d[(1, 1, None)] = -1.0
x = pd.Series(d)


In [23]: x
Out[23]: 
1  1  NaN   NaN
dtype: float64

@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

The problem is you are creating it in a very odd way. if you use (1,1,np.nan) or do it in a more proper way it will work.

In [19]: Series(-1,index=pd.MultiIndex.from_tuples([(1,1,None)]))
Out[19]: 
1  1  NaN   -1
dtype: int64

I suppose this is a bug. you can have a look thru if you'd like.

@jreback jreback reopened this Apr 21, 2016
@jreback jreback added this to the Next Major Release milestone Apr 21, 2016
@jreback jreback changed the title Confusing behavior for tuple index with None BUG: bug in construction using a tuple indexer with embedded None Apr 21, 2016
@mroeschke
Copy link
Member

This looks to work on master. Could use a test.

In [3]: d = {}
   ...: d[(1, 1, None)] = -1.0
   ...: x = pd.Series(d)

In [4]: x
Out[4]:
1  1  NaN   -1.0
dtype: float64

In [5]: pd.__version__
Out[5]: '0.26.0.dev0+593.g9d45934af'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants