Printing None and NaN values in Pandas dataframe produces confusing results #12045

lvphj · 2016-01-15T12:17:04Z

Printing a dataframe where a variable contains None values produces confusing results. Large dataframes are automatically split to print to screen. If all the values on one side of the splits are None, they are actually displayed as NaN. This can be demonstrated with the following code.

import pandas as pd
tempDF = pd.DataFrame({'id':    np.arange(1,11),
                       'text':  ['some words'] + [None]*9})
print('Full database\n-------------\n',tempDF,'\n')
with pd.option_context('display.max_rows', 8, 'display.max_columns', 3):
    print('Split database\n--------------\n',tempDF,'\n')

This produces the following output:

Full database
-------------

    id        text
0   1  some words
1   2        None
2   3        None
3   4        None
4   5        None
5   6        None
6   7        None
7   8        None
8   9        None
9  10        None 

Split database
--------------
     id        text
0    1  some words
1    2        None
2    3        None
3    4        None
..  ..         ...
6    7         NaN
7    8         NaN
8    9         NaN
9   10         NaN

[10 rows x 2 columns]

Above the split, the variable 'text' has one cell which has a genuine string ('some words'). All the None values on that side of the split are correctly displayed as 'None'. However, on the bottom part of the split, all the cells contain None values but are confusingly displayed as Nan.

Expected behaviour: All None values should be displayed as 'None' rather than 'NaN'.

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.17.1
nose: None
pip: 1.5.6
setuptools: 3.6
Cython: None
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-01-15T12:40:32Z

Thanks for the report!
As noted in a comment on SO, if the full dataframe consists of None values, you also get this:

In [11]: with pd.option_context('display.max_rows', 4): print tempDF[3:]
    id text
3    4  NaN
4    5  NaN
..  ..  ...
8    9  NaN
9   10  NaN

[7 rows x 2 columns]

But of course, None's get converted to NaNs silently in a lot of pandas operations

lvphj · 2016-01-15T13:16:03Z

And if a cell in the bottom half of the split contains a real string, all the values in the top half are NaN:

     id        text
0    1         NaN
1    2         NaN
2    3         NaN
3    4         NaN
..  ..         ...
6    7        None
7    8  some words
8    9        None
9   10        None

I suppose converting Nones to NaNs is reasonable – providing it is done consistently.

jreback · 2016-01-15T13:31:38Z

this is the same issue as #11594 (though different dtypes).

the fix actually is pretty symbol. here is the code, need an .astype after the .concat to the original dtype of that column.

Also should be fixed for the Series formatter

closes pandas-dev#12411 closes pandas-dev#12045 closes pandas-dev#11594 closes pandas-dev#10571

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jan 15, 2016

jorisvandenbossche mentioned this issue Jan 15, 2016

BUG: displayed dtype of series inferred from shown subset instead of series #11594

Closed

2 tasks

jreback added Difficulty Novice labels Jan 15, 2016

jreback modified the milestones: 0.18.0, Next Major Release Jan 15, 2016

jreback modified the milestones: 0.18.1, Next Major Release Feb 23, 2016

jreback added a commit to jreback/pandas that referenced this issue Mar 23, 2016

BUG: concatenation with a coercable dtype was too aggressive

aa9e2cc

closes pandas-dev#12411 closes pandas-dev#12045 closes pandas-dev#11594 closes pandas-dev#10571

jreback mentioned this issue Mar 23, 2016

BUG: concatenation with a coercable dtype was too aggressive #12702

Closed

jreback closed this as completed in 85f8cf7 Mar 23, 2016

jreback mentioned this issue Feb 27, 2017

BUG: concat with empty frame upcasts float32 -> float64 #15525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Printing None and NaN values in Pandas dataframe produces confusing results #12045

Printing None and NaN values in Pandas dataframe produces confusing results #12045

lvphj commented Jan 15, 2016

jorisvandenbossche commented Jan 15, 2016

lvphj commented Jan 15, 2016

jreback commented Jan 15, 2016

Printing None and NaN values in Pandas dataframe produces confusing results #12045

Printing None and NaN values in Pandas dataframe produces confusing results #12045

Comments

lvphj commented Jan 15, 2016

jorisvandenbossche commented Jan 15, 2016

lvphj commented Jan 15, 2016

jreback commented Jan 15, 2016