-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Read CSV error_bad_lines does not error for too many values in first data row #12519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
looks like it. thanks! pull-requests are welcome to fix! |
I'd like to contribute. |
http://pandas.pydata.org/pandas-docs/stable/contributing.html are the contributing docs you can submit a pr for code comments |
Addresses issue in pandas-dev#12519 by raising exception when 'filepath_or_buffer' in 'read_csv' contains different number of fields in input lines.
These phenomena also exist with the Python parser, but I will say the second example is handled in internal documentation (see here). In light of that, I wouldn't consider the behaviour (the |
We raise a ParserWarning in this case since 1.3, see #21768 |
Hi, I would like to report an unexpected behaviour connected with option error_bad_lines (I just reference this for make it easier to find this bug if someone was to report the same).
Given the following two CSV files:
simple.csv
simple2.csv
The following code breaks, as expected:
The fact that the following code does not break is up to discussion (though it is inconsistent depending on whether an error is in first row or futher ones):
As the index is read as the first column and hence it comes down to another potential issue reported (ragarding erroring on too few values), so I am not going into this here.
So the result is as follows:
However even if I specify explicitly, that there is no index in CSV:
It still works, yielding the result:
And this is definitely a bug I believe. I discovered it by accident, as in CSV that I was about to read comma was used also as decimal separator in first column and the totally corrupted CSV ended up read and parsed as DataFrame.
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.17.1
nose: 1.3.7
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: 2.8
The text was updated successfully, but these errors were encountered: