-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv treats \x00 as EOL instead of null value #14012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't see an error on >>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>> data="""var1,var2,var3
1,2,0
2,\x00,0
3,4,0
4,5,0
"""
>>> df = read_csv(StringIO(data))
>>> df
var1 var2 var3
0 1.0 2.0 0.0
1 2.0 NaN NaN
2 NaN 0.0 NaN
3 3.0 4.0 0.0
4 4.0 5.0 0.0 |
It should be: var1 var2 var3 On Aug 16, 2016 10:57 PM, "gfyoung" [email protected] wrote:
|
@spillz : Sorry, I was meaning to write more to clarify my comment. In the meantime, could you add that (the expected output) to your original issue? |
Fixes bug in C parser in which the NULL character ('\x00') was being interpreted as a true line terminator, escape character, or comment character because it was used to indicate that a user had not specified these values. As a result, if the data contains this value, it was being incorrectly parsed. It should be parsed as NULL. Closes pandas-devgh-14012.
Not sure if this is a bug, but it took me a long time to figure out what was going on in a much bigger datafile than the sample one below.
Code Sample, a copy-pastable example if possible
import pandas
import StringIO
data='''var1,var2,var3
1,2,0
2,\x00,0
3,4,0
4,5,0
'''
print pandas.read_csv(StringIO.StringIO(data))
Expected Output
A table with 4 rows instead of 5, or an error.
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: