You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pandas_bug.tsv looks like this
3 abc 5.6
4 "abc" 4.3
5 "error 3.3
This code line results in an error
ParserError: Error tokenizing data. C error: EOF inside string starting at line 2
for another case in a larger file, this error does not occur and pandas silently skips some lines, however, when converting the same tsv file to json via file io, the pandas read_json function handles this gracefully and adds an escape character in front of the string. e.g. "error
Expected Output
0 1 2
0 3 abc 5.6
1 4 abc 4.3
2 5 "error 3.3
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
I posted my comment to this issue: #5500
but now I see that the following day the issue was closed and no one ever referred to my suggestion, so I repost it here:
Code Sample, a copy-pastable example if possible
pd.read_csv("pandas_bug.tsv", sep="\t", index_col=None, header=None, encoding='utf-8', skip_blank_lines=True, quotechar='"')
Problem description
The pandas_bug.tsv looks like this
3 abc 5.6
4 "abc" 4.3
5 "error 3.3
This code line results in an error
ParserError: Error tokenizing data. C error: EOF inside string starting at line 2
for another case in a larger file, this error does not occur and pandas silently skips some lines, however, when converting the same tsv file to json via file io, the pandas read_json function handles this gracefully and adds an escape character in front of the string. e.g. "error
Expected Output
0 3 abc 5.6
1 4 abc 4.3
2 5 "error 3.3
Output of
pd.show_versions()
pandas: 0.22.0
pytest: 3.7.0
pip: 10.0.1
setuptools: 36.2.7
Cython: 0.28.2
numpy: 1.14.5
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.4
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: