You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Python std lib has made a tremendous effort to support os.PathLike objects, especially pathlib.Path, across the standard library. ZipFile was somewhat recently patched to add PathLike support for externally facing paths, but it appears they did not add this support to ZipInfo which is used to interface with the files within a zip archive.
DataFrame's to_csv() method was recently updated to handle passing additional arguments when using zip compression. If compression is specified as a dict and archive_name is passed as a key, the value currently must be a str because ZipInfo requires a string. Since pandas has exposed this to the user it would be nice for PathLike objects like pathlib.Path to get converted to a str before being passed to ZipInfo.
If someone wants to raise this upstream on the Python issue tracker, that's also an acceptable outcome. Perhaps this was just an oversight on the previous issue where they neglected to consider the arguments to ZipInfo as externally facing.
Expected Output
Zip file is saved with no exceptions
Zip file contains a *.csv file (it's only because pandas defaults to creating a zip file containing an identically named zip file that passing archive_name is necessary for what should be default behavior. )
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
import numpy as np
import pandas as pd
from pathlib import Path
# Generate random data
random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
# Define the path for the zip file
csv_file = Path('random_data.zip')
# Convert Path object to string for the archive_name
data.to_csv(
csv_file.with_suffix('.zip'),
index=False,
compression={'method': 'zip', 'archive_name': str(csv_file.with_suffix('.csv'))}
)
# Verify the output
print(f"Zip file '{csv_file}' created successfully with contents:")
import zipfile
# List the contents of the created zip file
with zipfile.ZipFile(csv_file, 'r') as zip_ref:
print(zip_ref.namelist())
Code Sample, a copy-pastable example if possible
Problem description
The Python std lib has made a tremendous effort to support os.PathLike objects, especially pathlib.Path, across the standard library. ZipFile was somewhat recently patched to add PathLike support for externally facing paths, but it appears they did not add this support to ZipInfo which is used to interface with the files within a zip archive.
DataFrame's
to_csv()
method was recently updated to handle passing additional arguments when using zip compression. If compression is specified as adict
andarchive_name
is passed as a key, the value currently must be astr
because ZipInfo requires a string. Since pandas has exposed this to the user it would be nice for PathLike objects likepathlib.Path
to get converted to astr
before being passed to ZipInfo.If someone wants to raise this upstream on the Python issue tracker, that's also an acceptable outcome. Perhaps this was just an oversight on the previous issue where they neglected to consider the arguments to ZipInfo as externally facing.
Expected Output
archive_name
is necessary for what should be default behavior. )Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.6.2
Cython : 0.29.14
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.3.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.1
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.2.0
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.6
tables : 3.4.4
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0
The text was updated successfully, but these errors were encountered: