Skip to content

to_csv compression dict option 'archive_name' should accept os.PathLike #31934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
flutefreak7 opened this issue Feb 12, 2020 · 1 comment
Open
Labels
Enhancement IO CSV read_csv, to_csv

Comments

@flutefreak7
Copy link

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
from pathlib import Path

random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
csv_file = Path('random_data.zip')
data .to_csv(csv_file.with_suffix('.zip'), index=False, compression={'method': 'zip', 'archive_name': csv_file})

Problem description

The Python std lib has made a tremendous effort to support os.PathLike objects, especially pathlib.Path, across the standard library. ZipFile was somewhat recently patched to add PathLike support for externally facing paths, but it appears they did not add this support to ZipInfo which is used to interface with the files within a zip archive.

DataFrame's to_csv() method was recently updated to handle passing additional arguments when using zip compression. If compression is specified as a dict and archive_name is passed as a key, the value currently must be a str because ZipInfo requires a string. Since pandas has exposed this to the user it would be nice for PathLike objects like pathlib.Path to get converted to a str before being passed to ZipInfo.

If someone wants to raise this upstream on the Python issue tracker, that's also an acceptable outcome. Perhaps this was just an oversight on the previous issue where they neglected to consider the arguments to ZipInfo as externally facing.

Expected Output

  • Zip file is saved with no exceptions
  • Zip file contains a *.csv file (it's only because pandas defaults to creating a zip file containing an identically named zip file that passing archive_name is necessary for what should be default behavior. )

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.6.2
Cython : 0.29.14
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.3.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.1
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.2.0
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.6
tables : 3.4.4
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0

@jbrockmendel jbrockmendel added the IO CSV read_csv, to_csv label Feb 25, 2020
@ljluestc
Copy link


import numpy as np
import pandas as pd
from pathlib import Path

# Generate random data
random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])

# Define the path for the zip file
csv_file = Path('random_data.zip')

# Convert Path object to string for the archive_name
data.to_csv(
    csv_file.with_suffix('.zip'), 
    index=False, 
    compression={'method': 'zip', 'archive_name': str(csv_file.with_suffix('.csv'))}
)

# Verify the output
print(f"Zip file '{csv_file}' created successfully with contents:")
import zipfile

# List the contents of the created zip file
with zipfile.ZipFile(csv_file, 'r') as zip_ref:
    print(zip_ref.namelist())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

4 participants