-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_csv regression in 0.23.1 #21471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please provide a reproducible example: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
@WillAyd Francois's example is reproducible for me on Windows 7 using master. The output file test.txt.gz is empty instead of containing data. If I let pandas do the compression it appears to work fine:
|
Hi, import sys
import pandas as pd
df = pd.DataFrame([0,1])
df.to_csv(sys.stdout) This code writes the dataframe to a file named |
I also have a problem with "to_csv" specifically on 0.23.1. Looks like function "_get_handle()" returns "f" as FD number (int) instead of buf. # GH 17778 handles zip compression for byte strings separately.
buf = f.getvalue()
if path_or_buf:
f, handles = _get_handle(path_or_buf, self.mode,
encoding=encoding,
compression=self.compression)
f.write(buf)
f.close() Error text: File "/Users/wr/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 1745, in to_csv
formatter.save()
File "/Users/wr/anaconda3/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 168, in save
f.write(buf)
AttributeError: 'int' object has no attribute 'write' |
@WillAyd , I did a quick research. It seems that all "file-like" objects which cannot be converted to string file paths are affected. Gzip wrapper, stdout, FD's - all these problems have the same origin. Example with FD: import pandas
import os
with os.fdopen(3, 'w') as f:
print(f)
pandas.DataFrame([0, 1]).to_csv(f) Output:
I guess, integer comes from "name" attribute of TextIOWrapper. For STDOUT it will be |
Writing to TemporaryFile fails as well. The file remains empty:
|
Hi, here are some additional examples of the changes in the behaviour of A common use case is to write a file header once and then write many dataframes' data to that file. Our implementation looks like this:
This works in 0.23.0 but in 0.23.1 it produces a file that looks like this:
What happened here is that pandas has opened a second handle to the same file path in write mode, and our Flushing alone would not help because now pandas will overwrite our data:
produces:
One workaround is both flushing manually AND giving pandas a write mode:
IMO this is not expected behaviour: if we give pandas an open file handle, we don't expect pandas to find out what the original path was, and open it again on a second file handle. This is the bit of code where re-opening is decided: https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/csvs.py#L139 . This gives the "" behaviour pointed out by @saidie . Data is written to a StringIO first, finally the file is opened again by path and the data in the StringIO is written to it. |
Thanks all for the reports! |
hello, raised a PR to remedy this issue. welcome testing and review. for reports from @francois-a and @saidie, and other reproducible, this patch should fix it. for now a workaround would be to use file path or StringIO. |
Closed via #21478 |
Writing to gzip no longer works with 0.23.1:
produces corrupted output. This works fine in 0.23.0.
Presumably this is related to #21241 and #21118.
The text was updated successfully, but these errors were encountered: