Skip to content

pandas 1.0.0 read_csv() is broken use open( buffering=0) option. #31575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
paihu opened this issue Feb 2, 2020 · 4 comments · Fixed by #31596
Closed

pandas 1.0.0 read_csv() is broken use open( buffering=0) option. #31575

paihu opened this issue Feb 2, 2020 · 4 comments · Fixed by #31596
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@paihu
Copy link
Contributor

paihu commented Feb 2, 2020

Code Sample

import os
import pandas
import tempfile

fname = ""
with tempfile.NamedTemporaryFile(delete=False, mode="w+", encoding="shift-jis") as f:
        f.write("てすと\nbar")
        fname = f.name
print(fname)

try:
        with open(fname,mode="r", encoding="shift-jis") as f:
                result = pandas.read_csv(f)
                print("read shift-jis")
                print(result)

        with open(fname,mode="r", encoding="shift-jis") as f:
                result = pandas.read_csv(f,encoding="utf-8")
                print("open shift-jis file and read_csv with encoding: utf-8")
                print(result)

        with open(fname,mode="rb") as f:
                result = pandas.read_csv(f,encoding="shift-jis")
                print("open binary with buffered and read_csv with encoding: shift-jis")
                print(result)

        with open(fname,mode="rb",buffering=0) as f:
                result = pandas.read_csv(f,encoding="shift-jis")
                print("open binary without burrered and read_csv with encoding: shift-jis")
                print(result)
except Exception as e:
        print(e)

os.unlink(fname)

Problem description

Pandas 1.0.0, this sample does not work. But pandas 0.25.3, this sample works fine.

Open file with buffering=0 option, f is RawIOBase. This case seems encoding option will be ignored.

https://github.com/pandas-dev/pandas/pull/30771/files#diff-0335ae9037e4eb4747749a9f94cffd32R641

https://github.com/pandas-dev/pandas/pull/30771/files#diff-777d7549579ddc0c6e67596ad87e0d27R1879

Expected Output

/tmp/tmpxxxxxxxxxxx
read shift-jis
   てすと
0  bar
open shift-jis file and read_csv with encoding: utf-8
   てすと
0  bar
open binary with buffered and read_csv with encoding: shift-jis
   てすと
0  bar
open binary without burrered and read_csv with encoding: shift-jis
   てすと
0  bar

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-18362-Microsoft
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8

pandas : 1.0.0
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@charlesdong1991 charlesdong1991 added IO CSV read_csv, to_csv Bug labels Feb 2, 2020
@charlesdong1991
Copy link
Member

@gfyoung trace back seems it's relative to #30771

any thoughts here?

@gfyoung
Copy link
Member

gfyoung commented Feb 2, 2020

https://github.com/pandas-dev/pandas/search?utf8=%E2%9C%93&q=BufferedIOBase&type=

Good catch! I think we need to expand those isinstance checks to include TextIOWrapper as well. There are only three places where this is done.

@gfyoung gfyoung added the Regression Functionality that used to work in a prior pandas version label Feb 2, 2020
@charlesdong1991
Copy link
Member

thanks @gfyoung

do you want to work on this @paihu ? @gfyoung has provided a clear tip for it ^^

@paihu
Copy link
Contributor Author

paihu commented Feb 3, 2020

Yes, I want to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants