Skip to content

COMPAT: Properly encode filenames in read_csv #24758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1790,6 +1790,7 @@ I/O
- Bug in :meth:`DataFrame.to_dict` when the resulting dict contains non-Python scalars in the case of numeric data (:issue:`23753`)
- :func:`DataFrame.to_string()`, :func:`DataFrame.to_html()`, :func:`DataFrame.to_latex()` will correctly format output when a string is passed as the ``float_format`` argument (:issue:`21625`, :issue:`22270`)
- Bug in :func:`read_csv` that caused it to raise ``OverflowError`` when trying to use 'inf' as ``na_value`` with integer index column (:issue:`17128`)
- Bug in :func:`read_csv` that caused the C engine on Python 3.6+ on Windows to improperly read CSV filenames with accented or special characters (:issue:`15086`)
- Bug in :func:`read_fwf` in which the compression type of a file was not being properly inferred (:issue:`22199`)
- Bug in :func:`pandas.io.json.json_normalize` that caused it to raise ``TypeError`` when two consecutive elements of ``record_path`` are dicts (:issue:`22706`)
- Bug in :meth:`DataFrame.to_stata`, :class:`pandas.io.stata.StataWriter` and :class:`pandas.io.stata.StataWriter117` where a exception would leave a partially written and invalid dta file (:issue:`23573`)
Expand Down
8 changes: 7 additions & 1 deletion pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -677,7 +677,13 @@ cdef class TextReader:

if isinstance(source, basestring):
if not isinstance(source, bytes):
source = source.encode(sys.getfilesystemencoding() or 'utf-8')
if compat.PY36 and compat.is_platform_windows():
# see gh-15086.
encoding = "mbcs"
else:
encoding = sys.getfilesystemencoding() or "utf-8"

source = source.encode(encoding)

if self.memory_map:
ptr = new_mmap(source)
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/io/parser/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1904,6 +1904,18 @@ def test_suppress_error_output(all_parsers, capsys):
assert captured.err == ""


def test_filename_with_special_chars(all_parsers):
# see gh-15086.
parser = all_parsers
df = DataFrame({"a": [1, 2, 3]})

with tm.ensure_clean("sé-es-vé.csv") as path:
df.to_csv(path, index=False)

result = parser.read_csv(path)
tm.assert_frame_equal(result, df)


def test_read_table_deprecated(all_parsers):
# see gh-21948
parser = all_parsers
Expand Down