to_csv compression dict option 'archive_name' should accept os.PathLike #31934

flutefreak7 · 2020-02-12T18:30:39Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
from pathlib import Path

random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
csv_file = Path('random_data.zip')
data .to_csv(csv_file.with_suffix('.zip'), index=False, compression={'method': 'zip', 'archive_name': csv_file})

Problem description

The Python std lib has made a tremendous effort to support os.PathLike objects, especially pathlib.Path, across the standard library. ZipFile was somewhat recently patched to add PathLike support for externally facing paths, but it appears they did not add this support to ZipInfo which is used to interface with the files within a zip archive.

DataFrame's to_csv() method was recently updated to handle passing additional arguments when using zip compression. If compression is specified as a dict and archive_name is passed as a key, the value currently must be a str because ZipInfo requires a string. Since pandas has exposed this to the user it would be nice for PathLike objects like pathlib.Path to get converted to a str before being passed to ZipInfo.

If someone wants to raise this upstream on the Python issue tracker, that's also an acceptable outcome. Perhaps this was just an oversight on the previous issue where they neglected to consider the arguments to ZipInfo as externally facing.

Expected Output

Zip file is saved with no exceptions
Zip file contains a *.csv file (it's only because pandas defaults to creating a zip file containing an identically named zip file that passing archive_name is necessary for what should be default behavior. )

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.6.2
Cython : 0.29.14
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.3.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.1
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.2.0
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.6
tables : 3.4.4
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0

The text was updated successfully, but these errors were encountered:

ljluestc · 2024-10-19T22:22:02Z


import numpy as np
import pandas as pd
from pathlib import Path

# Generate random data
random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])

# Define the path for the zip file
csv_file = Path('random_data.zip')

# Convert Path object to string for the archive_name
data.to_csv(
    csv_file.with_suffix('.zip'), 
    index=False, 
    compression={'method': 'zip', 'archive_name': str(csv_file.with_suffix('.csv'))}
)

# Verify the output
print(f"Zip file '{csv_file}' created successfully with contents:")
import zipfile

# List the contents of the created zip file
with zipfile.ZipFile(csv_file, 'r') as zip_ref:
    print(zip_ref.namelist())

jbrockmendel added the IO CSV read_csv, to_csv label Feb 25, 2020

mroeschke added the Enhancement label Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_csv compression dict option 'archive_name' should accept os.PathLike #31934

to_csv compression dict option 'archive_name' should accept os.PathLike #31934

flutefreak7 commented Feb 12, 2020

INSTALLED VERSIONS

ljluestc commented Oct 19, 2024

to_csv compression dict option 'archive_name' should accept os.PathLike #31934

to_csv compression dict option 'archive_name' should accept os.PathLike #31934

Comments

flutefreak7 commented Feb 12, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

ljluestc commented Oct 19, 2024

Output of `pd.show_versions()`