Dataframe constructor fails when given dict with None value #14381

gitj · 2016-10-09T01:01:11Z

A small, complete example of the issue

# Your code here

import pandas as pd
pd.Dataframe(dict(a=None), index= [0])

In [3]: pd.DataFrame(dict(a=None),index=[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-20b65f605ca3> in <module>()
----> 1 pd.DataFrame(dict(a=None),index=[0])

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    264                                  dtype=dtype, copy=copy)
    265         elif isinstance(data, dict):
--> 266             mgr = self._init_dict(data, index, columns, dtype=dtype)
    267         elif isinstance(data, ma.MaskedArray):
    268             import numpy.ma.mrecords as mrecords

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    400             arrays = [data[k] for k in keys]
    401 
--> 402         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    403 
    404     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5382 
   5383     # don't force copy because getting jammed in an ndarray anyway
-> 5384     arrays = _homogenize(arrays, index, dtype)
   5385 
   5386     # from BlockManager perspective

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _homogenize(data, index, dtype)
   5693                 v = lib.fast_multiget(v, oindex.values, default=NA)
   5694             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5695                                 raise_cast_failure=False)
   5696 
   5697         homogenized.append(v)

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2917 
   2918     # scalar like
-> 2919     if subarr.ndim == 0:
   2920         if isinstance(data, list):  # pragma: no cover
   2921             subarr = np.array(data, dtype=object)

AttributeError: 'NoneType' object has no attribute 'ndim'

Expected Output

This previously worked with a sensible output in 0.18.1:

In [2]: pd.DataFrame(dict(a=None),index=[0])
Out[2]:
a
0 None

Output of `pd.show_versions()`

Working version: ## INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Broken version:

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-10-09T17:12:22Z

So this works correctly in the following cases.

In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]: 
     a
0  NaN

In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]: 
    a
0 NaN

The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.

pull-requests to fix are welcome.

…as-dev#14381

shawnheide · 2016-10-11T15:18:07Z

Hey @brandonmburroughs, I saw that you're working on this too and beat me to the PR. No worries, I wasn't as far along. Just wanted to let you know that the same problem shows up with the Series constructor too, i.e. Series([None]) fails to coerce to NaN.

I looked at fixing it a little further down the stack in series.py, but didn't check with any tests yet. Feel free to see my commit above that referenced this.

gitj · 2016-10-11T16:54:58Z

I was going to work on a PR but looks like you guys are on top of it. Thanks!

brandonmburroughs · 2016-10-11T17:06:26Z

@shawnheide I actually noticed this problem after I created my PR. I created an issue (#14393) about this and there is some discussion going on there as to how to handle this as the cases are different. Depending upon how they want to handle the API design, your fix may be better suited to handle all cases.

jorisvandenbossche · 2016-10-26T10:50:40Z

@jreback Given your comment in #14393 (comment), I would personally say that the above case should not coerce to NaN, but keep the None. Thoughts?
(in any case that is the conservative road for now, as that was the behaviour in 0.18.1)

But in that case, @brandonmburroughs, your PR should be updated.

jreback · 2016-10-26T10:53:19Z

yeah open to having it be pre-0.19.0 behavior (IOW, remain as object) is fine.

jorisvandenbossche · 2016-10-26T10:54:03Z

To illustrate, in pandas 0.18:

In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]: 
      a
0  None

In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]: 
      a
0  None

So for 0.19.1, I would choose to go back to 0.18.1 behaviour, so not coercing to NaN (keep as None).
We can discuss if we want to change for later releases.

gitj added a commit to ColumbiaCMB/kid_readout that referenced this issue Oct 9, 2016

pin pandas at 0.18.1 because of pandas-dev/pandas#14381

aaff3ab

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Intermediate labels Oct 9, 2016

jreback added this to the Next Major Release milestone Oct 9, 2016

jorisvandenbossche modified the milestones: 0.19.1, Next Major Release Oct 10, 2016

brandonmburroughs mentioned this issue Oct 11, 2016

BUG: Dataframe constructor when given dict with None value #14392

Merged

4 tasks

shawnheide added a commit to shawnheide/pandas that referenced this issue Oct 11, 2016

BUG: Dataframe constructor fails when given dict with None value pand…

6eddbab

…as-dev#14381

brandonmburroughs mentioned this issue Oct 11, 2016

Dataframe constructor does not coerce data=[None] to np.nan #14393

Closed

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Oct 11, 2016

chris-b1 mentioned this issue Oct 13, 2016

Cannot initialize a data frame with None #14414

Closed

jorisvandenbossche closed this as completed in #14392 Oct 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataframe constructor fails when given dict with None value #14381

Dataframe constructor fails when given dict with None value #14381

gitj commented Oct 9, 2016 •

edited by jorisvandenbossche

Loading

INSTALLED VERSIONS

jreback commented Oct 9, 2016

shawnheide commented Oct 11, 2016

gitj commented Oct 11, 2016

brandonmburroughs commented Oct 11, 2016

jorisvandenbossche commented Oct 26, 2016

jreback commented Oct 26, 2016

jorisvandenbossche commented Oct 26, 2016

Dataframe constructor fails when given dict with None value #14381

Dataframe constructor fails when given dict with None value #14381

Comments

gitj commented Oct 9, 2016 • edited by jorisvandenbossche Loading

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Oct 9, 2016

shawnheide commented Oct 11, 2016

gitj commented Oct 11, 2016

brandonmburroughs commented Oct 11, 2016

jorisvandenbossche commented Oct 26, 2016

jreback commented Oct 26, 2016

jorisvandenbossche commented Oct 26, 2016

gitj commented Oct 9, 2016 •

edited by jorisvandenbossche

Loading

Output of `pd.show_versions()`