-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: to_csv requires escapechar unnecessarily when data contains null byte \x00 (Python 3.10+ only) #47871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @daviewales for the report. I'm getting but I get the same output as you on main. So maybe not related to Powershell/bash difference. Further investigation required. (for instance my 1.4.3 env is Python 3.10 and my dev env (main) is Python 3.8) I should probably do a bisect to confirm that the difference i'm seeing is not down to a code change. |
Hmm... Interesting. I just tried on my Mac, and I'm getting the INSTALLED VERSIONScommit : e8093ba pandas : 1.4.3 |
on my dev machine (used laptop above) that is an ubuntu server i'm seeing the same as before but then my envs are setup in a similar manor so I expect the same differences in the envs exist
|
looks like could be Python version. I'm seeing "need to escape, but no escapechar set" with python 3.10 but not on python 3.9 or 3.8 using the same version of pandas, either 1.3.5 or 1.4.3 (pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ python --version
Python 3.8.13
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ python bisect/47871.py
1.4.3
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a A.".".
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.4.3
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ python bisect/47871.py
1.4.3
need to escape, but no escapechar set
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a A.
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ python --version
Python 3.10.5
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.4.3-py3.9
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ python bisect/47871.py
1.4.3
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a A.".".
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ python --version
Python 3.9.13
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$
(pandas-dev) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.3.5
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ python bisect/47871.py
1.3.5
need to escape, but no escapechar set
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a A.
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ python --version
Python 3.10.1
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ conda activate activate pandas-1.3.5-py3.9
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ python bisect/47871.py
1.3.5
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a A.".".
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ python --version
Python 3.9.13
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ |
I had the same issue as well, and like @simonjayhawkins said, the issue goes away when I change python 3.10 to 3.9 (conda downgraded pandas from 1.4.4 to 1.2.4 after changing python's version) |
I had the same issue. I tried a lot of ways to fix it:
None of the above worked! What worked was changing Python version from 3.10 to 3.9. |
Decided to try the reproducible examples with the current versions of Python 3.11.1 and pandas 1.5.3. With those versions on Fedora 37, I am getting the exact same output using both reproducible examples:
However if I revert back to Python 3.10.9 (still with pandas 1.5.3), I do get the dreaded So is this the expected output? If so it appears that whatever caused this issue may have been fixed in Python 3.11.1 (or possibly 3.11.0 which I didn't test). I also did not test previous (or future) versions of pandas, just 1.5.3. |
+1 facing the same issue |
On Python 3.10 and 3.11, an upstream bug in pandas causes a failure when serializing a dataframe to csv when there's a null byte in the dataframe. This pull request leaves the default behaviour alone, but gives users the options to modify to_csv behaviour, including fixing that issue with the `escapechar` parameter. See pandas-dev/pandas#47871
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
NOTE: I'm running this in PowerShell, but using
xxd
from Windows Subsystem for Linux.If a dataframe contains a null byte
\x00
as a value,to_csv
requiresescapechar
to be set when run from PowerShell, but not when run from Ubuntu/Bash. However, theescapechar
is not actually used, and does not appear in the output file.Expected Behavior
I expect that if
escapechar
is not used, I shouldn't need to useescapechar='\\'
.I also expect that the flags for
to_csv
should be the same for both PowerShell and Ubuntu/Bash.i.e. I expect that the following should just work in both PowerShell and Bash, without needing to specify
escapechar
:Installed Versions
Pandas in Powershell:
INSTALLED VERSIONS
commit : e8093ba
python : 3.10.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22000
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Australia.1252
pandas : 1.4.3
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 58.1.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : 0.8.1
fsspec : 2022.5.0
gcsfs : None
markupsafe : 2.1.1
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : 1.4.39
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
Pandas in Ubuntu/Bash
INSTALLED VERSIONS
commit : e8093ba
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.3
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: