You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
df.to_csv('abc.csv', errors='surrogatepass') # saving works fine.# Try to load:# Attempt 1:pd.read_csv('abc.csv')
# Fails. UnicodeEncodeError: 'utf-8' codec can't encode characters in position 30682-30685: surrogates not allowed# Attempt 2:pd.read_csv('abc.csv', errors='surrogatepass')
# Fails. No `errors` parameter.# Attempt 3:withopen('abc.csv', errors='surrogatepass') as_file:
df=pd.read_csv(_file)
# Fails. UnicodeEncodeError: 'utf-8' codec can't encode characters in position 30682-30685: surrogates not allowed
Describe the solution you'd like
Recently, we added errors as a function parameter to to_csv in this merged PR. Can we do the same for read_csv? This solution would make Attempt 2 work.
(Not sure why Attempt 3 doesn't work since read_csv accepts a file handler object.)
@davidleejy do you have an example that needs errors='surrogatepass' for decode?
Your example x="\ud83d\ude4f".encode('utf-16', 'surrogatepass').decode('utf-16') needs errors='surrogatepass' during encode which is not what read_csv does internally.
Related to problem:
Describe the solution you'd like
Recently, we added
errors
as a function parameter toto_csv
in this merged PR. Can we do the same forread_csv
? This solution would make Attempt 2 work.(Not sure why Attempt 3 doesn't work since
read_csv
accepts a file handler object.)API breaking implications
Should not break.
Describe alternatives you've considered
see (futile) Attempt 3 above.
Additional context
Section "Error handlers" in https://docs.python.org/3/library/codecs.html says:
Example of encoding & decoding:
The text was updated successfully, but these errors were encountered: