-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: read_csv throws UnicodeDecodeError with unicode aliases #13571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
d485c4a
ae62350
36bcdd8
285ccf9
173c38b
78d46d6
35dfb13
71f084e
da8fce4
1825486
1d30333
4f680d7
b582195
e26c92a
d14b69e
eeb7011
b8d78c4
75869f4
9c88919
6725536
671ad41
3c4a798
5675b82
ff6117e
b983957
451c054
33278a9
181cecd
a2e5d54
6c8b21b
5d99cff
8e7904f
a07b5d3
ff2a335
1f8cc7f
f743eb3
e161699
5765b92
ac18b36
1fc6b90
6b0e2ca
41a6fae
f730e60
05a2d04
c4e93bd
430273d
1fa91b9
e379e9f
a35521e
6c09821
5584dff
9463dee
5198179
3c30cd0
e77ac2d
69ab536
1eb478d
a2f178f
8e05f7e
ab153d5
0c1de9f
77ec966
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1469,3 +1469,34 @@ def test_memory_map(self): | |
|
||
out = self.read_csv(mmap_file, memory_map=True) | ||
tm.assert_frame_equal(out, expected) | ||
|
||
def test_read_csv_utf_aliases(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. self |
||
# see gh issue 13549 | ||
engines = ['c', 'python', None] | ||
path = 'test.csv' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use the context manager
remove |
||
expected = DataFrame({"A": [0, 1], "B": [2, 3]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we include multibytes columns / values? |
||
expected.to_csv(path, encoding='utf-8', index=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about merging 2 tests like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
test_encodings = ['utf-8', 'utf_8', 'UTF_8', 'UTF-8'] | ||
|
||
for encoding in test_encodings: | ||
for engine in engines: | ||
out = pd.io.parsers.read_csv( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, wasn't sure if that would work |
||
path, | ||
engine=engine, | ||
encoding=encoding) | ||
tm.assert_frame_equal(out, expected) | ||
|
||
os.remove("test.csv") | ||
|
||
expected.to_csv(path, encoding='utf-16', index=False) | ||
test_encodings = ['utf-16', 'utf_16', 'UTF_16', 'UTF-16'] | ||
|
||
for encoding in test_encodings: | ||
for engine in engines: | ||
out = pd.io.parsers.read_csv( | ||
path, | ||
engine=engine, | ||
encoding=encoding) | ||
tm.assert_frame_equal(out, expected) | ||
|
||
os.remove("test.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in 0.18.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around
UnicodeDecodeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.read_csv()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like 0.18.2 was moved to 0.19.0 this weekend