-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.DataFrame.to_csv('filename.zip') doesn't extract with a '.csv' extension #26023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll take a look at this! |
Has this thing been fixed already? |
There's an open PR: #26024. |
having the same issue |
I can confirm. The problem only occurs when using the 'zip' option. |
Are you guys kidding. Has this been fixed? in pandas 1.2.1 the same dumb behavior again. arcname is not recognized as a valid dict option. to_csv still saves .zip inside .zip, whereas it is expected to save .csv inside .zip. Can someone explain to me correct syntax for achieving this, please? Ah sorry found it. Now it can be given as |
when my PR #40387 is accepted ,you can use 'myfile.csv.zip' as infer zipfile name, it will use "myfile.csv" as archive_name, but it won't add ".csv" to the end automatically |
Code Sample, a copy-pastable example if possible
Problem description
When
pd.DataFrame.to_csv
creates compressed zip files, the name of the csv file inside the archive is always the same as the name of the zip archive file itself. This is obviously problematic because the archive has a.zip
extension but we want the csv file to have a.csv
extension when it is extracted.Other compression methods meant for a single file like 'bz2', 'gzip', and 'xz' do not have this problem because a file 'file.csv.gz' for instance, will automatically become 'file.csv' when decompressed.
This would be a relatively easy fix by adding an
arcname=None
parameter toto_csv
, passing it throughpandas.io.formats.csvs.CSVFormatter
topandas.io.formats.csvs._get_handle
and using that instead ofZipFile.filename
if provided.Expected Output
See comments in Code Sample above for expected output.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-17-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: 4.3.0
pip: 19.0.3
setuptools: 40.0.0
Cython: 0.28.5
numpy: 1.16.2
scipy: 1.2.1
pyarrow: 0.11.1
xarray: None
IPython: 7.1.1
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: 2.4.11
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: