Throwing on an invalid charset seems a little much for a minor release #807

tgriesser · 2014-05-06T15:35:18Z

Ticket #789 added an error thrown when an invalid charset is specified.

Is this a feature to throw? Would doing a a console.error with a message or something along those lines make more sense?

I had for awhile assumed that utf8 was a valid choice for a charset since it's what I would have used in mysql, the docs even have > charset utf8; in multiple examples.

So I'd advertised it in the docs for knex.js as the default for connection settings example, and I'm sure there are a lot of people who still have that as a config option, for example here in the ghost blogging platform.

I'm just afraid this might cause more breakage than it's worth for a minor release, particularly now that the ^ is the default for npm install --save, and since as far as I can tell from mysql docs, utf8 makes sense as a valid value for a charset.

The text was updated successfully, but these errors were encountered:

dougwilson · 2014-05-06T15:52:36Z

Would doing a a console.error with a message or something along those lines make more sense?

We really won't want to spew out random to people's stderr/stdout pipes.

Is this a feature to throw?

The "feature" is that people had no idea what the valid values were, so it is very dangerous to specify an unknown value, as it would do something you didn't intend silently. Since throwing is a "feature" it was introduced in 2.2, which is according to semver conventions.

I had for awhile assumed that utf8 was a valid choice for a charset since it's what I would have used in mysql

That may be, but the docs for this module showed the valid was "UTF8_GENERAL_CI" and required to be in caps since v2.0.0-alpha2 and clarified it needed to be in caps (which is now no longer the case) since v2.0.0-alpha9 (this is all before v2 was even released). It was documented correctly, but the code was incorrectly ignoring bad values.

the docs even have > charset utf8; in multiple examples.

Those docs are for SET calls in MySQL queries, not for the protocol. The MySQL docs for the actual protocol charset is here: http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

This client calls it "charset" because that it what it is called on the protocol level (http://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake), but it is technically the collation that the protocol is using.

particularly now that the ^ is the default for npm install --save

Perhaps, but not throwing was a bug in the prior releases; all bug fixes are technically breaking, which does suck.

dougwilson · 2014-05-06T15:56:27Z

Forgot to address this line:

as far as I can tell from mysql docs, utf8 makes sense as a valid value for a charset.

Remember that "charset" is what the protocol calls it, but what is negotiated is the collation, as in there is no binary value that corresponds to "utf8"; you can see the full list here: https://github.com/felixge/node-mysql/blob/v2.2.0/lib/protocol/constants/charsets.js

tgriesser · 2014-05-06T16:03:46Z

We really won't want to spew out random to people's stderr/stdout pipes.

Fair

The "feature" is that people had no idea what the valid values were, so it is very dangerous to specify an unknown value, as it would do something you didn't intend silently. Since throwing is a "feature" it was introduced in 2.2, which is according to semver conventions.

I guess feature depends a lot on one's interpretation. I see it as a breaking change

Remember that "charset" is what the protocol calls it, but what is negotiated is the collation

Right, so if really what you're requesting there is the collation name, not the charset name, then it's the API that's unclear/incorrect.

Would it be possible to add a lookup table between the actual character set names and the collation names, so as to allow a more flexible charset api? Or to instead change charset to collation in the config, as that's really what you're asking for/providing.

tgriesser · 2014-05-06T16:06:07Z

If so, I'd gladly work on it and open a pull request

tgriesser · 2014-05-06T16:20:11Z

So also, looking at http://dev.mysql.com/doc/internals/en/charsets.html it really looks like the charset value should correspond to things like utf8 latin1, it's just always been that the collation was mislabeled as charset in node-mysql.

So anyway, still think this should be addressed differently than throwing for those putting a valid charset value in a config slot that asks for a charset, not a collation.

dougwilson · 2014-05-06T16:27:36Z

I guess feature depends a lot on one's interpretation. I see it as a breaking change

That's fine. It won't be going away, though. 2.2+ will throw on unknown charset settings. Commit 09e3343 is also a bug fix, but will suddenly make stuff error when before it silently did nothing. Adding a throw to fix a bug is a standard thing. Node.js core has added throws when the type to an API in patch releases many times.

An API that, given an unknown value, changes that value to a default value is very clearly broken. Like I said, this is not going to change. You're welcome to spec >= 2 < 2.2 if you like :)

Would it be possible to add a lookup table between the actual character set names and the collation names, so as to allow a more flexible charset api?

Not particularly. If you give "utf8" how would it choose between the following values?

UTF8_GENERAL_CI
UTF8_BIN
UTF8_UNICODE_CI
UTF8_ICELANDIC_CI
UTF8_LATVIAN_CI
UTF8_ROMANIAN_CI
UTF8_SLOVENIAN_CI
UTF8_POLISH_CI
UTF8_ESTONIAN_CI
UTF8_SPANISH_CI
UTF8_SWEDISH_CI
UTF8_TURKISH_CI
UTF8_CZECH_CI
UTF8_DANISH_CI
UTF8_LITHUANIAN_CI
UTF8_SLOVAK_CI
UTF8_SPANISH2_CI
UTF8_ROMAN_CI
UTF8_PERSIAN_CI
UTF8_ESPERANTO_CI
UTF8_HUNGARIAN_CI
UTF8_SINHALA_CI
UTF8_GERMAN2_CI
UTF8_CROATIAN_MYSQL561_CI
UTF8_UNICODE_520_CI
UTF8_VIETNAMESE_CI
UTF8_GENERAL_MYSQL500_CI
UTF8_GENERAL50_CI

If so, I'd gladly work on it and open a pull request

See above.

So anyway, still think this should be addressed differently than throwing for those putting a valid charset value in a config slot that asks for a charset, not a collation.

Yes, the naming is confusing, I agree, but http://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake calls it "charset" in the protocol, rather than collation for whatever reason. I don't want to have both a charset and a collation configuration value as aliases, but I have slated to renaming the config to collation in v3.

TL;DR I'm glad this fix has brought to your attention the invalid protocol charset option you were supplying.

dougwilson · 2014-05-06T16:35:27Z

@tgriesser feel free to email me (in my GitHub profile) if you are interested in discussing a solution over the phone.

tgriesser · 2014-05-06T16:39:29Z

Following up with an email.

* log missing auth plugin name * refactor auth handling * auth: fix AllowNativePasswords * auth: remove plugin name print * packets: attempt to fix writePublicKeyAuthPacket * packets: do not NUL-terminate auth switch packets * move handleAuthResult to auth * add old_password auth tests * auth: add empty old_password test * auth: add cleartext auth tests * auth: add native auth tests * auth: add caching_sha2 tests * rename init and auth packets to documented names * auth: fix plugin name for switched auth methods * buffer: optimize default branches * auth: add tests for switch to caching sha2 * auth: add tests for switch to cleartext password * auth: add tests for switch to native password * auth: sync NUL termination with official connectors * packets: handle missing NUL bytes in AuthSwitchRequests Updates mysqljs#795

dougwilson added the question label May 6, 2014

dougwilson closed this as completed May 6, 2014

ErisDS mentioned this issue Oct 15, 2015

MySQL connection not set to UTF-8? TryGhost/Ghost#5945

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throwing on an invalid charset seems a little much for a minor release #807

Throwing on an invalid charset seems a little much for a minor release #807

tgriesser commented May 6, 2014

dougwilson commented May 6, 2014

dougwilson commented May 6, 2014

tgriesser commented May 6, 2014

tgriesser commented May 6, 2014

tgriesser commented May 6, 2014

dougwilson commented May 6, 2014

dougwilson commented May 6, 2014

tgriesser commented May 6, 2014

Throwing on an invalid charset seems a little much for a minor release #807

Throwing on an invalid charset seems a little much for a minor release #807

Comments

tgriesser commented May 6, 2014

dougwilson commented May 6, 2014

dougwilson commented May 6, 2014

tgriesser commented May 6, 2014

tgriesser commented May 6, 2014

tgriesser commented May 6, 2014

dougwilson commented May 6, 2014

dougwilson commented May 6, 2014

tgriesser commented May 6, 2014