-
Notifications
You must be signed in to change notification settings - Fork 442
MR #346 breaks sphinxsearch connection #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Fixed with manual string converters:
This helps partially (sphinxsearch specific)
|
It's totally wrong solution. Don't use the converter at all. |
Actually speaking, you broke your library by this commit: tumb1er/django_sphinxsearch@9519c5f mysqlclient automatically decodes the string. But you override it so mysqlclient doesn't decode the string anymore. But a bug in mysqlclient (#343) hided your bug. Fixing #343 revealed your bug. I hate converter. It is source of tons of bugs. It is not tested well. Overriding converter will cause surprising bugs like yours. |
Yes, I agree. That was an attempt to fix ascii decoding errors appeared nobody knows when and where. Most probable candidates are sphinxsearch update from 2.3.x to 3.1.x and mysqlclient bugfixes. I tried to enable
Russian characters are not decoded correctly. Do you have any ideas, how to determine cause of with error (sphinxsearch vs mysqlclient usage)? |
Looks like it's decoding on python side:
What is the difference of using converters vs use_unicode for strings? |
Have you specified charset? Until MySQL 8.0, the default connection charset was latin1. |
It seems, defaults for sphinxsearch are different. Setting charset to "utf8mb4" or "utf8" does not fix decoding error.
This is fun. Russian letters are ok (it's default windows charset for russian language), but now encoding is broken :-/ |
Interesting.
|
It seems you are confused by Django wrappers. |
Plain MySQLdb also works in confusing manner... from MySQLdb import *
from MySQLdb import converters, constants
conv = converters.conversions.copy()
conv[constants.FIELD_TYPE.STRING] = lambda x: x.decode('utf-8')
conn = connect(host='127.0.0.1', port=9307,
use_unicode=True,
# conv=converters.conversions,
charset='utf8'
)
c = conn.cursor()
c.execute("select id, attr_string from sphinx___testapp_testmodel;")
print(c.fetchall())
# ((1, 'за вдв'),)
conn = connect(host='127.0.0.1', port=9307, use_unicode=False, conv=conv)
c = conn.cursor()
c.execute("select id, attr_string from sphinx___testapp_testmodel;")
print(c.fetchall())
# ((1, 'за вдв'),) I'm also confused with this line in if use_unicode:
for t in (FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING, FIELD_TYPE.VARCHAR, FIELD_TYPE.TINY_BLOB,
FIELD_TYPE.MEDIUM_BLOB, FIELD_TYPE.LONG_BLOB, FIELD_TYPE.BLOB):
self.converter[t] = _bytes_or_str Which encoding uses |
I expect data in your table is broken already. |
Tried already, no errors.
This is strange and wrong. |
I confirmed the issue. And #382 will fix it. mysqlclient used #382 configure the charset before calling |
I released 1.4.4. |
MR #346 makes all string values returned from sphinxsearch connection via mysql protocol to be bytes.
Steps to reproduce:
It seems that every single string column is now broken.
I'm not sure that this is a bug of mysqlclient, but need help with getting strings back.
The text was updated successfully, but these errors were encountered: