You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The output of pd.isnull/pd.isna on lists depends on the inferred dtype of the numpy conversion.
In cases where the array is inferred to be of a string type, numpy converts np.NaN to the string "nan", which pd.isnull() no longer recognizes as a null value. The following solve the underlying problem, which is numpy auto-inferring a string dtype for mixed lists containing strings and float('nan') float values:
explicitly convert to object arrays instead of string arrays, as is done in pd.Series construction
convert to a pd.Series object instead of the numpy object (leverages the above)
applying pd.isna() in a list comprehension for list objects, e.g.
defisna(a): #a is a listnp.array([pd.isna(el) forelina])
Expected Output
array([True, False], dtype=bool)
or
TypeError
if it is undesirable to support lists with mixed float/string types with pd.isnull()
As you mention, this is due to np.asarray(...) coercing everything to a string once there is a string in it. I would say this is a bug (or design issue) in numpy, but since it is a well known one, we should workaround it, as we do in other places.
From a quick look, this might be used instead of asarray:
In [58]: pd.core.dtypes.cast.maybe_convert_platform([np.nan, 'world'])
Out[58]: array([nan, 'world'], dtype=object)
I want to update so I'm not thought of as a raise-and-dump kind of person: awaiting approval to contribute from work, but have a fix ready if/when that happens.
Code Sample, a copy-pastable example if possible
Problem description
The output of
pd.isnull
/pd.isna
on lists depends on the inferred dtype of the numpy conversion.In cases where the array is inferred to be of a string type, numpy converts
np.NaN
to the string"nan"
, whichpd.isnull()
no longer recognizes as a null value. The following solve the underlying problem, which is numpy auto-inferring a string dtype for mixed lists containing strings andfloat('nan')
float values:pd.Series
constructionpd.Series
object instead of the numpy object (leverages the above)pd.isna()
in a list comprehension for list objects, e.g.Expected Output
or
TypeError
if it is undesirable to support lists with mixed float/string types with
pd.isnull()
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1048-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: 3.2.2
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: