-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: Properly encode filenames in read_csv #24758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: Properly encode filenames in read_csv #24758
Conversation
f8307fc
to
cfad6ec
Compare
lgtm. ping on green. |
Codecov Report
@@ Coverage Diff @@
## master #24758 +/- ##
=======================================
Coverage 92.38% 92.38%
=======================================
Files 166 166
Lines 52363 52363
=======================================
Hits 48376 48376
Misses 3987 3987
Continue to review full report at Codecov.
|
Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3.6+ and on Windows, after which we force the encoding to "mbcs". Closes pandas-devgh-15086.
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=6891 @jreback : The lint failure is unrelated to my changes. However, I'll rebase and push again to be safe. |
cfad6ec
to
ab663b4
Compare
Codecov Report
@@ Coverage Diff @@
## master #24758 +/- ##
=======================================
Coverage 92.38% 92.38%
=======================================
Files 166 166
Lines 52363 52363
=======================================
Hits 48376 48376
Misses 3987 3987
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24758 +/- ##
=======================================
Coverage 92.38% 92.38%
=======================================
Files 166 166
Lines 52363 52363
=======================================
Hits 48376 48376
Misses 3987 3987
Continue to review full report at Codecov.
|
@jreback : Rebased, and all is green. PTAL. |
thanks @gfyoung |
Thank @gfyoung. Do you know if the CI would have failed on that path without your fix? IIUC, the failure was somewhat system-dependent, though I don't recall the exact details. |
@TomAugspurger : Yes, it would have. It's a Windows Python 3.6+ issue. |
Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3.6+ and on Windows, after which we force the encoding to "mbcs". Closes pandas-devgh-15086.
Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3.6+ and on Windows, after which we force the encoding to "mbcs". Closes pandas-devgh-15086.
Python 3.6+ changes the default encoding to
utf8
(PEP 529), which conflicts with the encoding of Windows (mbcs
). This conflicts rears its head when using only the C engine ofread_csv
because of the low level functionality that we implement ourselves.This fix checks if we're using Python 3.6+ and on Windows, after which we force the encoding to
mbcs
Closes #15086.