Skip to content

BUG:Cannot write as xlsx to GCS #33987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gfelot opened this issue May 5, 2020 · 1 comment · Fixed by #37639
Closed

BUG:Cannot write as xlsx to GCS #33987

gfelot opened this issue May 5, 2020 · 1 comment · Fixed by #37639
Labels
Bug IO Excel read_excel, to_excel IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Milestone

Comments

@gfelot
Copy link

gfelot commented May 5, 2020

Code Sample, a copy-pastable example

output_path = f"gs://my_bucket/incidents/prediction/{ds}_incidents_result"
[...]
export_app.to_parquet(f"{output_path}.parquet")
export_app.to_excel(f"{output_path}.xlsx")

Problem description

I want to write the DF as a .parquet and a .xlsx file to a GCloud Storage bucket.
I launch the job in a K8S pod and I finally got the error message :

textPayload: "[Errno 2] No such file or directory: 'gs://my_bucket/incidents/prediction/2020-04-29_incidents_result.xlsx'

Next I change the to_excel -> to_csv and everything works as expected.

Do the to_excel can handle a path as gs://... ? Because it's the only issue I can see with that.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.138+
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : 0.6.1
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : 1.3.0
xlsxwriter : None
numba : None

@gfelot gfelot added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2020
@jbrockmendel jbrockmendel added IO Excel read_excel, to_excel IO Network Local or Cloud (AWS, GCS, etc.) IO Issues and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2020
@twoertwein
Copy link
Member

twoertwein commented Oct 30, 2020

Since to_excel does currently not support writing to file objects it cannot support gs://. The two engines (openpyxl and xlsxwriter) that actually write the excel file also seem to want a file path and no file object.

Pandas would probably need to create a temporary file (I'm not sure whether pandas wants that), let the two engines write to it, read the content, and then write it to a file object to support gs://.

It is probably easier if the user calls to_excel with a local filename, reads the content, and then sends it to GCloud Storage.

edit: the excel backends seem to support file handles, so it should be possible to add support for google cloud storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants