Skip to content

Printing None and NaN values in Pandas dataframe produces confusing results #12045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lvphj opened this issue Jan 15, 2016 · 3 comments
Closed
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@lvphj
Copy link

lvphj commented Jan 15, 2016

Printing a dataframe where a variable contains None values produces confusing results. Large dataframes are automatically split to print to screen. If all the values on one side of the splits are None, they are actually displayed as NaN. This can be demonstrated with the following code.

import pandas as pd
tempDF = pd.DataFrame({'id':    np.arange(1,11),
                       'text':  ['some words'] + [None]*9})
print('Full database\n-------------\n',tempDF,'\n')
with pd.option_context('display.max_rows', 8, 'display.max_columns', 3):
    print('Split database\n--------------\n',tempDF,'\n')

This produces the following output:

Full database
-------------

    id        text
0   1  some words
1   2        None
2   3        None
3   4        None
4   5        None
5   6        None
6   7        None
7   8        None
8   9        None
9  10        None 

Split database
--------------
     id        text
0    1  some words
1    2        None
2    3        None
3    4        None
..  ..         ...
6    7         NaN
7    8         NaN
8    9         NaN
9   10         NaN

[10 rows x 2 columns] 

Above the split, the variable 'text' has one cell which has a genuine string ('some words'). All the None values on that side of the split are correctly displayed as 'None'. However, on the bottom part of the split, all the cells contain None values but are confusingly displayed as Nan.

Expected behaviour: All None values should be displayed as 'None' rather than 'NaN'.

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.17.1
nose: None
pip: 1.5.6
setuptools: 3.6
Cython: None
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8
@jorisvandenbossche jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jan 15, 2016
@jorisvandenbossche
Copy link
Member

Thanks for the report!
As noted in a comment on SO, if the full dataframe consists of None values, you also get this:

In [11]: with pd.option_context('display.max_rows', 4): print tempDF[3:]
    id text
3    4  NaN
4    5  NaN
..  ..  ...
8    9  NaN
9   10  NaN

[7 rows x 2 columns]

But of course, None's get converted to NaNs silently in a lot of pandas operations

@lvphj
Copy link
Author

lvphj commented Jan 15, 2016

And if a cell in the bottom half of the split contains a real string, all the values in the top half are NaN:

     id        text
0    1         NaN
1    2         NaN
2    3         NaN
3    4         NaN
..  ..         ...
6    7        None
7    8  some words
8    9        None
9   10        None

I suppose converting Nones to NaNs is reasonable – providing it is done consistently.

@jreback
Copy link
Contributor

jreback commented Jan 15, 2016

this is the same issue as #11594 (though different dtypes).

the fix actually is pretty symbol. here is the code, need an .astype after the .concat to the original dtype of that column.

Also should be fixed for the Series formatter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants