-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
error_bad_lines is ignored if names argument is used in read_csv function #22144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report - investigation and PRs are always welcome! |
Will try to have a look at this one ! |
This error is giving me headaches ^^
So for this case we have as choices :
The error resides for |
@louis-red thanks for the summary as it is very much appreciated! I'm not sure I follow what you are referring to with the "implicit index" though - any chance you can link to the code base where you were seeing the error? |
Of course @WillAyd, sorry for the delay ;
We have unexpected results here when specifying Line 2901 in eb0ac54
Synthesis of the outputs according to inputs : names = ['ID', 'X1', 'X2', 'X3']
# With malformed line on line 3
s = """0,1,2,3
1,2,3,4
4,3,2,1,5"""
pd.read_csv(io.StringIO(s), names=names, error_bad_lines=False, header=None,
engine='python')
# Output : Good
ID X1 X2 X3
0 0 1 2 3
1 1 2 3 4
Skipping line 3: Expected 4 fields in line 3, saw 5
# With malformed line on line 2
s = """0,1,2,3
1,2,3,4,5
4,3,2,1"""
pd.read_csv(io.StringIO(s), names=names, error_bad_lines=False, header=None,
engine='python')
# Output : Good or Bad ?
ID X1 X2 X3
0 1 2 3 NaN
1 2 3 4 5.0
4 3 2 1 NaN
# With index_col = False and malformed line on line 2 (or 3, it doesn't matter)
s = """0,1,2,3
1,2,3,4,5
4,3,2,1"""
pd.read_csv(io.StringIO(s), names=names, error_bad_lines=False, header=None,
engine='python',
index_col=False)
# Output : Bad
ID X1 X2 X3
0 0 1 2 3
1 1 2 3 4
2 4 3 2 1 |
@louis-red thanks I think I follow. You can always try removing that conditional to see what breaks; perhaps it is even errant in the first place. A PR would certainly be welcome if you could get it to pass tests locally |
@WillAyd for issues like this that are only tangentially related to an opened Github issue, what is the best practice here for reference in a PR ? Opening a new issue clearly outlining the particular problem or not necessary ? |
Sorry I thought you were addressing the issue with the |
Code Sample, a copy-pastable example if possible
Problem description
Bad lines option (error_bad_lines=False) is ignored when using the names argument.
When omitting the names option everything works fine with pandas 0.23.3 (see issue #20573), but when names is used a ValueError is raised (ValueError: invalid literal for int() with base 10: 'amigo').
Expected Output
b'Skipping line 3: expected 4 fields, saw 5\n'
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.3
pytest: 3.4.2
pip: 9.0.2
setuptools: 38.5.2
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: