Skip to content

Dataframe constructor does not coerce data=[None] to np.nan #14393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brandonmburroughs opened this issue Oct 11, 2016 · 5 comments
Closed

Dataframe constructor does not coerce data=[None] to np.nan #14393

brandonmburroughs opened this issue Oct 11, 2016 · 5 comments
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@brandonmburroughs
Copy link
Contributor

brandonmburroughs commented Oct 11, 2016

A small, complete example of the issue

As called out in #14381, if dtype is not specified, values of None should be coerced to np.nan. However, when a list of only None is passed to data, the None remains.

>>> pd.DataFrame([None])
      0
0  None
>>> pd.DataFrame([None, None])
      0
0  None
1  None

Expected Output

>>> pd.DataFrame([None])
      0
0  NaN
>>> pd.DataFrame([None, None])
      0
0  NaN
1  NaN

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.4
pip: 8.1.2
setuptools: 5.8
Cython: 0.21
numpy: 1.11.2
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.5.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: 2.32.1
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Oct 11, 2016

IIRC this was actually special cased to allow this to be object dtype, with the passed None' s remaining. (further, Series([None]) is another case).

We don't coerce provided None's in a constructor for object dtype, e.g.

In [3]: Series(['a', None])
Out[3]: 
0       a
1    None
dtype: object

So I agree that we should be consistent, but its difficult to know when to leave an actual None.

I last changed this here, though I recall this being the case for quite some time. We like to preserve an inferred dtype, in this case object.

So I am not sure this is incorrect. This is an explicit user action, we treat None and missing as slightly different.

Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).

I suppose one could argue that these are the same.

@shoyer @jorisvandenbossche @wesm

@brandonmburroughs
Copy link
Contributor Author

Okay, I see what you're saying. Your explanation makes sense and I could see it going either way. Thanks for the insight!

@jorisvandenbossche
Copy link
Member

I would also keep the current behaviour of keeping object dtype for now.

@jorisvandenbossche
Copy link
Member

Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).

I would not make the distinction in this case. The scalar value just denotes a constant value for the full columns, so None or [None, None, None, ..] should give the same result IMO.
That is also what 0.18 does:

In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]: 
      a
0  None

In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]: 
      a
0  None

@jreback jreback added this to the No action milestone Oct 26, 2016
@jreback
Copy link
Contributor

jreback commented Oct 26, 2016

ok, so closing this as no-change.

@jorisvandenbossche can you put that example on the spawning issue. #14381.

@jreback jreback closed this as completed Oct 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

3 participants