Skip to content

BUG: use_inf_as_na options raises value error for read_csv #35493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
mhaselsteiner opened this issue Jul 31, 2020 · 2 comments · Fixed by #35510
Closed
2 of 3 tasks

BUG: use_inf_as_na options raises value error for read_csv #35493

mhaselsteiner opened this issue Jul 31, 2020 · 2 comments · Fixed by #35510
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@mhaselsteiner
Copy link

mhaselsteiner commented Jul 31, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample to Reproduce Error

import pandas as pd
import numpy as np
pd.set_option('use_inf_as_na', True)
pd.DataFrame({'test_data':[1,3,4,np.nan]}).to_csv('test_data.csv', na_rep='NaN')
pd.read_csv('test_data.csv',sep=',' ,na_values='NaN')

Causes ValueError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read
    data = parser.read(nrows)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1201, in read
    df = DataFrame(col_dict, columns=columns, index=index)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/frame.py", line 467, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 250, in init_dict
    missing = arrays.isna()
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/series.py", line 4795, in isna
    return super().isna()
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/generic.py", line 7109, in isna
    return isna(self).__finalize__(self, method="isna")
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 124, in isna
    return _isna(obj)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 157, in _isna
    return _isna_ndarraylike(obj, inf_as_na=inf_as_na)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 218, in _isna_ndarraylike
    result = _isna_string_dtype(values, dtype, inf_as_na=inf_as_na)
  File "/home/lena/anaconda3/envs/pandas_pip2/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 246, in _isna_string_dtype
    vec = libmissing.isnaobj_old(values.ravel())
  File "pandas/_libs/missing.pyx", line 160, in pandas._libs.missing.isnaobj_old
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Problem description

Nans are not recognized as expected anymore if pandas option use_inf_as_nais set to True. Occurred first after upgrading to pandas 1.1.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2
setuptools : 49.2.0.post20200712
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@mhaselsteiner mhaselsteiner added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2020
@simonjayhawkins
Copy link
Member

Thanks @mhaselsteiner for the report.

Occurred first after upgrading to pandas 1.1.0

can confirm ok in 1.0.5, so marking as regression.

>>> pd.__version__
'1.0.5'
>>>
>>> pd.set_option('use_inf_as_na', True)
>>> df = pd.DataFrame({'test_data':[1,3,4,np.nan]})
>>> data = df.to_csv(na_rep='NaN')
>>> print(data)
,test_data
0,1.0
1,3.0
2,4.0
3,NaN

>>>
>>> from io import StringIO
>>> pd.read_csv(StringIO(data),sep=',' ,na_values='NaN')
   Unnamed: 0  test_data
0           0        1.0
1           1        3.0
2           2        4.0
3           3        NaN
>>>

@simonjayhawkins simonjayhawkins added IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 31, 2020
@simonjayhawkins
Copy link
Member

this issue starts to occur with #33656 cc @dsaxton

678a9ac is the first bad commit
commit 678a9ac
Author: Daniel Saxton [email protected]
Date: Sun May 10 12:12:45 2020 -0500

BUG: Fix StringArray use_inf_as_na bug (#33656)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants