-
Notifications
You must be signed in to change notification settings - Fork 442
UnicodeDecodeError when using mysql client 8.0 to connect to mysql server 5.7 #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hm. There are some issue reports caused by not sending
On the other hand, some database don't support I will change to send |
On reflection, I do not think this was an intermittent issue. (I thought it was intermittent because I was changing the client installation to try to bisect different versions of the library and I wasn’t careful enough). I think this was a problem of using a [email protected] library to connect to a [email protected] server. The MySQL 8.0 documentation describes the issue in MySQL 8.0 Reference Manual / Character Sets, Collations, Unicode / Connection Character Sets and Collations: Connect-Time Error Handling I will update the title to describe my new understanding. |
MySQL 8.0 changed the default collation for utf8 charset from Since MySQL 5.7 don't know |
Describe the bug
The underlying issue is described in MySQL 8.0 Reference Manual / Character Sets, Collations, Unicode / Connection Character Sets and Collations: Connect-Time Error Handling. When you use the MySQL 8.0 client library to connect to a MySQL 5.7 server with the utf8mb4 encoding, then the server falls back to latin1 instead, and the server will send result sets that are invalid UTF-8, resulting in UnicodeDecodeError when the client tries to decode them. The symptoms are similar to a server with
--skip-character-set-client-handshake
.As the Oracle document describes, when MySQLdb is linked with the mysql-client 8.0 library and attempts to connect to a mysql 5.7 server with "charset": "utf8mb4", then the mysql client library sends a HandshakeResponse41 packet requesting charset+collation number 255: utf8mb4_0900_ai_ci, which is the new default collation for the utf8mb4 charset in MySQL 8.0. However, this new collation does not exist on a MySQL 5.7 server, so the server silently falls back to the character_set_server (latin1) and collation_server (latin1_swedish_ci). In MySQL 5.7, the default collation for utf8mb4 was charset+collation number 45: utf8mb4_general_ci.
We currently call the C function
mysql_character_set_name
, but that function is a client-side lookup and does not verify the character set in case the server silently ignored the charset+collation from the handshake.The connection’s collation is apparently only used for comparing literal strings, not for comparing columns (which have their own collation), so all this trouble is for a pretty uncommon use case (character_set_collation).
This bug occurs if you use the OSX Homebrew [email protected] package, but surprisingly it does not occur when you use [email protected] (
brew install [email protected]; PATH=/usr/local/opt/[email protected]/bin:$PATH pip install -e /path/to/mysqlclient
). This is because mysql-client was compiled with-DDEFAULT_COLLATION=utf8mb4_general_ci
, whereas the mysql formula does not change the DEFAULT_COLLATION. With the default collation altered, mysql-client sends 45: utf8mb4_general_ci to the server in the HandshakeResponse41 packet, which mysql 5.7 recognizes.Googling, I saw that this seems to have occurred to other people too:
To Reproduce
Server
Code
On Mac OSX, you can use the mysql package:
Error
You also query the server to see that character_set_results is set to latin1 rather than utf8mb4:
Other implementations
How do other implementations handle the handshake?
@@character_set_results
(loadServerVariables)SET NAMES
statement if the charset/collation do not match what is required (configurePostHandshake)Workaround
If we are using the MySQL 8.0 client library to connect to a MySQL 5.7 server, we need to perform an additional
SET NAMES
to set the charset to utf8mb4.Currently,
Connection.set_character_set
(which is called duringConnection.__init__
) only executesSET NAMES
if it thinks that the parameters changed after connect. But because we can’t trust the client-sidemysql_character_set_name
function to return the server’s value of@@character_set_results
, we should justSET NAMES
unconditionally.Environment
MySQL Server
MySQL Client
OS (e.g. Windows 10, Ubuntu 20.04): OS X
Python: Homebrew Python 3.9.7 (but it also occurs in 2.7.10 with a compatible mysqlclient 1.4.6)
Connector/C: Homebrew mysql-client 8.0.26
The text was updated successfully, but these errors were encountered: